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A Method for Predicting the Therapeutic Outcome of A Treatment for an 

Affective Disorder 

This application claims the benefit of U.S. Provisional Application S.N. 
5 60/041 ,287 filed on March 20, 1 997, the disclosure of which is 
incorporated herein by reference. 

FIELD OF THE INVENTION 

10 A method for facilitating the selection of a treatment regime and for 

monitoring the outcome of a particular treatment regime on a disease based 
upon the expected outcome is provided. The treatment is selected from a 
group of possible treatments based upon the pre-treatment diagnostic data 
where more than one treatment regime could be selected. The method finds 

1 5 utility, for example, in the treatment and monitoring of disease states 
wherein the symptoms of the disease can result from more than one 
physiological condition. 

BACKGROUND OF THE INVENTION 

20 

While the method of the instant invention is useful for the treatment 
selection for more than one type of disorder which is diagnosed and treated 
based upon the symptoms, for simplicity, the treatment selection for a 
disorder wherein the diagnosis is made by a physician based upon somatic 

25 symptoms such as for example depression and especially unipolar 
depression, will be discussed therein. 

Recent studies suggest that in the United States about 6-10% of the 
population exhibit varying symptoms of depression which costs society 
billions of dollars annually. Depression is an affective mental health 

30 disorder which is diagnosed based upon descriptive criteria or somatic 
symptoms which are set forth in the Diagnostic and Statistical Manual of 
Mental Disorders (DSM-IV) (APA, 1 994). The severity of the disorder is 
diagnosed using the Hamilton Depression Rating Scale (HDRS) (Hamilton, 
1960) which is a clinical instrument devised by Hamilton which assesses 

35 the severity of the symptoms of the disorder. The instrument evaluates 
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twenty-one psychological, physical, and performance deficits. Many 
different malfunctions may give rise to the same set of somatic symptoms 
and the physiological basis for these malfunctions is not thoroughly 
understood. Thus, it is difficult to determine the correct treatment regime 
5 for a patient. 

In clinical research studies which are performed to assess the effect 
of a treatment, pre-treatment or baseline scores and post-treatment scores 
are typically compared. Several prior research efforts focused on the 
recovery pattern of depression symptoms as assessed using 

10 



In 1 984, Quitkin (Quitkin: 84} analyzed the patterns of general 
improvement in depressed patients in response to treatment with drug 

1 5 therapy. He compared four antidepressant drug treatments with a placebo 
(N=31 8). The results showed that a * x true drug response" was indicated by 
a pattern of delayed and persistent improvement. The delay was up to 4 
weeks, but once improvement started it continued. These results were 
replicated by Quitkin et al. in 1 987{Quitkin:87}. They used a 

20 measurement of overall general improvement in the patient's condition 
(CGI: Clinical Global Impression scale). 

Katz et al. (1987) (Katz:87} found that specific changes in 
symptoms after one week of treatment were predictors of response to 
imipramine and amitriptyline treatments in bipolar and unipolar 

25 patients (N=104). As the symptom measure, they used xx state 
constructs," which included HDRS as one of its measurements. 
According to their analysis (analysis of covariance), these 
measurements indicated week-one predictive symptoms to be a reduction 
in disturbed affects (distressed expression and anxiety (p < 0.001); 

30 depressed mood, hostility and agitation (p < 0.01 )); and cognitive 
functioning (cognitive impairment (p <0.01)). Retardation drops 
only after these symptoms drop. Sleep disorder drops non-differentially 
from an early stage for responders and non-responders. These symptoms 
were the ones that dropped early and were predictive of the outcome. 

35 Sleep disorder dropped early too, but was not predictive of the outcome 
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because it dropped both in responders and non-responders. Retardation 
dropped later in responders. 

The advantages of time series analysis were illustrated by Hull et al. 
{Hull:93} in documenting the treatment effects of fluoxetine in a 58 week 
5 in-patient trial. The data analyzed were from a self-report symptom 
scale obtained for a single patient (N=1). Forty weeks of pre-treatment 
data were available for the analysis. The amount of data obtained was 
sufficient for time series (intervention analysis) of the time course of 
depression symptoms. The data before intervention was best fit by the 
10 model identified as (AR, I, MA) = (0, 1, 1). This is a first order moving 
average model that operates on the first degree differential of the time 
series data. Eight * v dummy" variables corresponding to the intervention 
were then introduced. Each was a step function that changed from zero 
to one at week/ after intervention (/ = 0, 1, 7). Most 
15 symptom scores dropped significantly during the second week. The most 
noticeable was depression (p < 0.001). Some symptom scores showed 
rj additional drops by the fourth week. Psychoticism, characterized by 
J delusions or hallucinations was an exception, in that its primary 
t response occurred during the first week. 

20 Recently, a method of diagnosing or confirming a diagnosis of 

3 depression has been developed by Goldstein et. Al. (U.S. Patent No. 
J 5,591,588; Goldstein et. al.; the disclosure of which is incorporated herein 

by reference). Based upon laboratory determined blood values of the 
5 neurohormone arginine vasopressin and on the thymic hormone thymopoietin 
i 25 taken from blood samples obtained in the afternoon from patients and using 
a logistic regression model which was confirmed using a linear 
discrimination analysis, this diagnostic criterion was found to be accurate 
in 81% of the patients who were diagnosed as depressed using Hamilton 
Depression Rating Scale. 
30 The above described methods are useful for characterizing and 

diagnosing an affective disorder. However, assignment of a treatment based 
upon the diagnosis and characterization of the disorder is not achieved by 
these methods. Further, once a treatment is assigned to a patient based 
upon the currently used methods, no treatment specific recovery pattern is 
35 available to monitor the progress achieved by the patient at various time 
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points of treatment in between pre- and post-treatment assessment. 

The time resolution of the measurements is course. Data is collected 
weekly at best. Frequently data points are missing. Further, patient data 
gathered is rated on a five point scale and is qualitatively assessed. The 
5 population studied may not be representative of the entire range of the 
disorder; it may not be normally distributed in a statistical sense. In 
particular, the patient's progress is not compared with the pattern of 
recovery shown by patients who have received similar treatment regimes 
and who have been determined to be "recovered" based on HDSR with respect 
1 0 to the time course of the disappearance of symptoms. 

Several treatment regimes have proven effective in treating 
depression when pre- and post-treatment are compared, but the response to 
the various treatments is highly variable. Within a group of patients all 
O assessed to have the same HDSR, response to the same treatment is highly 
S 1 5 variable. Some people respond in the expected manner, while others do not. 
% Further variability is added in that some patients response in the same 
Ui manner to different treatments. These treatments include psychotherapy, 
y such as for example cognitive behavioral therapy (CBT) and/or drug 
% treatment, such as for example with a tricyclic anti-depressant drug (TCA) 
T 20 such as for example despiramine (DMI) or such as for example with a 
O selective serotonin reuptake inhibitor, such as for example, fluoxetine 
|j (FLU). Each treatment has proven successful with a certain subset of 
n patients exhibiting somatic symptoms of depression derived from the 
3 Hamilton Depression Rating Scale. However, identification of members of a 
ro 25 subset prior to the onset of treatment is difficult. Thus, optimal treatment 
selection is difficult for any given individual. 

Currently, once a patient is diagnosed as having the disorder, 
depression, and the severity of the disorder is assessed using the Hamilton 
Depression Rating Scale (HDRS), a single total score is obtained based upon 
30 a series of somatic indicators. Using the HDRS score, the doctor selects one 
treatment regime from among several possible treatment regimes. The 
choice of treatment has been based on the absence of undesirable side 
effects and on the training background of the clinician rather than on the 
knowledge of the potential efficacy of the treatment regime for the patient. 
35 Trial and error methods of treatment assignment have proven to have 
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limited success. Previous attempts at using statistical techniques to 
predict the outcome of treatment for depression have also proven to be 
weak indicators. A model with predictive value is needed to facilitate 
successful selection of a treatment regime for a patient exhibiting 
5 symptoms with varying severity associated with depression. 

Once the patient starts treatment, monitoring of the recovery process 
is performed qualitatively by the physician's assessment of the patient's 
rate of recovery. This assessment is based upon the physician's previous 
experience of recovery patterns from other individual patients. However, 

10 this experience is limited. What is needed is a method for monitoring the 
patient's recovery with time that would allow early detection of deviation 
from an expected recovery path where the recovery path is derived from a 
larger population sample. This would provide the physician with a more 
accurate predictor of the outcome of treatment. By comparing the 

1 5 individual's response to a representative response which resulted in 
recovery, the physician would be provided with a more rapid way to re- 
evaluate the treatment, and if needed, would allow the physician to alter 
the treatment regime, thus facilitating patient recovery. 

However, patient recovery is very idiosyncratic and highly variable. 

20 Thus, establishing predictive patterns of recovery has been thought to be 
unfeasible. Further, the pattern of recovery of any individual patient is 
thought to be too unique. Therefore, the usefulness of comparing any 
individual's recovery pattern with a predicted recovery pattern has been 
considered to have very limited usefulness. What is needed is a model 

25 which allows for variability while providing predictive value. 

Due to the variability of the data and confounded by the iodiosyncratic 
response of patients to the assigned treatment, analysis of the data in order 
to assign treatment and predict the outcome to that treatment, much less 
monitor the patient's progress in response to the treatment so that early 

30 intervention and alteration of treatment can be achieved has proven 

difficult. What is needed is a system to analyze the data which provides the 
physician with a method to predict and monitor outcome of treatment. 

It is an object of the instant invention to provide a method for 
standardizing the assignment of a treatment for a disorder, such as for 

35 example, depression. 
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It is a further object of the instant invention to provide a method for 
monitoring the effectiveness of an assigned treatment for a disorder which 
is diagnosed and monitored based upon symptoms assessed at various time 
intervals. 

5 It is an additional object of the instant invention to facilitating more 

timely intervention by the physician with respect to treatment choice when 
treatment is not progressing as expected. 

SUMMARY 

10 

The invention relates to a method useful for facilitating choosing a 
treatment or treatment regime and for predicting the outcome of a 
treatment for a disorder which is diagnosed and monitored by a physician or 
other appropriately trained and licensed professional, such as for example, a 

1 5 psychologist, based upon the symptoms experienced by a patient. Unipolar 
depression is an example of such a disorder, however the model may find use 
with other disorders and conditions wherein the patient response to 
treatment is variable. 

Further, the method provides a modeling system for generating the 

20 expected recovery pattern of a patient receiving a particular treatment 
which is useful for comparison with the actual recovery pattern of the 
patient to provide for monitoring of the patient's response. The expected 
recovery pattern is particularly one that has been generated by the recovery 
model of the instant invention. When the patient's response does not 

25 correspond to the predicted recovery pattern, the treatment regime can be 
re-evaluated. 

The preferred recovery model is a non-linear, second order neural 
network model for analyzing data to generate expected outcomes from a 
plurality of individual patterns of response. A data system which 
30 integrates individual responses, and through analysis by the model, provides 
a generalized expected pattern of outcome in response to the treatment 
when a particular pattern of symptoms is exhibited is also provided. 

A processing unit that weights the inputted patient data is provided. 
The weight depends upon the strength of the effect. At each point in time 
35 each unit of data has an activation value. The activation value is passed 
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through a function to produce an output. 

Each patient's recovery pattern is represented as a second order 
differential equation. The recovery pattern characteristics are represented 
by three parameters: latency (change with time) or when patient response 
5 begins within a six week treatment regime; interaction effects or how each 
of seven symptoms influence each other; and treatment effects or how each 
treatment effects each symptom. Symptoms are simplified for analysis and 
include parameters early sleep; middle and late sleep; energy; work; mood; 
cognitions; and anxiety. Responders are defined as those patients who 

1 0 exhibit an improvement of greater than 50% during the treatment period. 

The recovery model takes into account latency, treatment effects, and 
the interaction of the treatment effects. Time to response is also modeled. 
The model is trained to optimize the parameter values. The model output 
which is based upon the estimated parameters and the pretreatment 

1 5 symptoms, is compared to the desired patient data over a six week period of 
time on a day by day basis. The parameter estimates are adjusted so that 
the difference between the model output and the patient data decreases. 
This process is repeated until the parameters are optimized and thereby 
yield a model and output that best fit the patient data. 

20 The model can gain additional accuracy and precision through entry of 

additional patient data which is integrated into the model. Increased 
precision can be achieved by collecting patient data on a continuous basis 
from clinical studies and from physicians and psychologists, inputting the 
data, and updating the model. Thus, in an aspect of the invention, a method 

25 is provided for integrating data to provide treatment patterns that have 
greater predictive value than that typically available to an individual 
physician. 

Further, a method is provided for comparing individual patient 
response to a predicted outcome, thereby allowing the physician the ability 
30 to monitor the patient's response with time and to assess whether or not 
the treatment is resulting in the expected improvement in the disorder. 
When the expected improvement is not observed, the physician then can 
intervene and alter the treatment. 

Additionally, the invention provides a method for inputting data from 
35 patients, integrating that data into a data system to modify the expected 
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recovery pattern for a particular symptom set and for a particular 
treatment or treatment regime and thereby provide a predictive pattern of 
recovery for individual patterns of symptoms and responses to treatment 
that has greater predictive value. 

5 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 a illustrates flow chart for a prototypical symptom profiler. 
Figure 1 b illustrates a flow chart system architecture for Phase I. 
10 Figure 1c illustrates a flow chart for a patient data processing unit. 

Figure 2 illustrates a flow chart for a depression disorder integrated model 
Figure 3-2 illustrates a flow chart for a training cycle for training a model 
on actual patient data. 

Figure 3-3 illustrates an overview of a recovery model and the parameters 
1 5 used therein. 

Figure 3-4 illustrates the annotated second order differential equation used 
to model the pattern of recovery. 
Figure 3-5 illustrates latency modeling. 

Figure 3-6 illustrates direct effects and interactions of the recovery model. 
20 Figure 3-7 provides an over view of the training process. 

Figure 3-8 provides a schematic description of an equation useful for 
training the model. 

Figure 3-9 illustrates predicted patterns of recovery vs. actual patterns of 
recovery based upon two different modeling systems. 
25 Figure 3-10 illustrates individual patterns of recovery for four patients, 
wherein patients a and patient b receive CBT and patients c and d receive 
DMI. 

Figure 3-1 1 illustrates predicted and actual patterns of patient data based 
upon the mean values. 
30 Figure 3-12 illustrates mean half reduction time based upon the model's 
predicted values of latency for individual symptom factors. 
Figure 3-1 3 graphical illustrates the predicted CBT and DMI temporal 
response sequence of symptom improvement in patients diagnosed as having 
depression. 

35 Figure 3-14 illustrates a comparison of the model's predicted immediate 
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and delayed direct effects of treatment on symptoms for CBT and DM! 
treatment. 

Figure 3-1 5 graphically illustrates a representation of the sequence of 
symptom factors in recovery with CBT treatment for the second order model 
5 system. 

Figure 3-16 graphically illustrates a representation of the sequence of 
symptom factors in recovery with DMI treatment for the second order model 
system. 

Figure 3-17 graphically illustrates a sequence and causal relationship 
1 0 among patterns of recovery. 

Figure 4-1 graphically illustrates nonlinear mapping of backpropagation. 
Figure 4-2 provides a schematic representation of the effect of normalizing 
transformations on reducing nonlinearity of score-to-output relationships. 

J 1 5 DESCRIPTION OF THE BEST MODE OF THE INVENTION 

m Factors for analysis of recovery patterns were selected from the 

N Hamilton Depression Rating Scale (HDRS) . Three types of factors, physical, 
f performance and psychological, were included. Generally described these 
r 20 factors include: early sleep; middle and late sleep; energy; work 
O performance; mood; cogitions; and anxiety. General methods used for 
H statistical tests for verification of the modeling efforts as modified for 
p use with a neural net model which correct for over-fitting are described by 
S Luciano (Luciano; U.S. Provisional Patent Application S.N. 60/041 ,287, filed 
C 25 March 21, 1997). Also described therein are time series prediction 

verification methods to validate and result results obtained and outcome 
prediction verification methods. 

Referring now to Figure 1 a and 1 b, a symptom profiler developer and a 
system architecture for Phase I, (idealized profile development), 
30 respectively are illustrated. Figure 1 a provides an overview of the 

development of the symptom profiler. A prototypical system is developed to 
provide expected or so-called idealized profiles or patterns of symptoms 
over time in response to a selected treatment regime. These patterns are 
based upon actual clinical data derived from individual patient responses to 
35 a selected treatment. Clinical data are input from multiple sources. The 
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data are pre-processed and undergo statistical tests as is illustrated in 
Figure 1 c, some tests are standard and some are modified according to the 
methods described in detail below. The data are processed until the 
profiles are optimized on the data available at that time to create a trained 
symptom profiler. Completion of the training process of the system is then 
assessed based upon optimization of the preprocessed steps. In Figure 1 b, 
an overview of how the system can be used and modified to further optimize 
the system for providing treatment recommendations and predicted 
responses is presented. The trained system profiler contains a database of 
predicted responses. A user, such as for example a physician, enters patient 
data, such as for example via a computer, to the trained symptom profiler 
and receives a treatment recommendation and a profile of predicted 
responses to that treatment. Access to the trained symptom profiler 
optionally is through the Internet. Further, individual patient data and data 
from clinical studies may be input to the symptom profiler for on-going 
training of the symptom profiler. 

Referring now to Figure 2, a flow chart of a depression disorder 
integrated model (DDIM) is illustrated. After depression has been broadly 
diagnosed using DSM-IV, data are gathered from the patient using an 
instrument based upon the Hamilton Depression Rating Scale which is 
described below. During the treatment selection phase, these data are 
entered into the Outcome Predictor which provides a database of predicted 
outcomes in response to multiple treatments by comparing the patient's 
data to predicted outcomes based upon the information in the trained 
Outcome Predictor. The physician uses this information to choose the 
treatment most likely to produce the desired results, i.e. improvement in 
symptoms of depression. The physician monitors the patient's response to 
treatment and compares that response to a predicted response generated by 
the trained Pattern Predictor. When the patient's respond deviates from the 
expected response, the physician may alter the treatment regime assigned 
to the patient being treated. 

How the symptoms of depression as assessed by the Hamilton 
Depression Rating Scale (HDRS) change over time in response to treatment 
was studied to provide detailed patterns of recovery over time. A series of 
analyses of two groups of patients who responded to a particular treatment 
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regime was performed. One group of six patients responded to treatment 
with desipramine (DMI), an antidepressant drug medication, and the other 
group of six patients responded to treatment with cognitive behavioral 
therapy (CBT), a psychotherapy treatment. The detailed patterns of 
5 recovery in each of these patient groups were studied and modeled 
using systems of ordinary differential equations. This method revealed 
new information about how the symptom response patterns differ across 
treatments. 

A direct approach to fitting more than one patients' recovery data over 

10 time has not been previously attempted. The problems which must be 
overcome are the high level of noise and the inter-subject variation in 
recovery. Also lacking is a detailed model which uses the subjects initial 
data as a starting point. The instant invention describes a differential 
equation model which partially deals with this problem. Another problem 

1 5 has to do with the large amount of variance that remains after the best 
fitting model is constructed. Some of this variability is unavoidable and is 
due to defects in the measuring instrument. The model is shown to capture 
a significant part of the variance of the subjects data. 

The statistical reliability of the model's predictions over the two 

20 patient groups in recovery is demonstrated. From this model which is based 
upon a database comprised of data points gathered from assessment of 
individual patients over time, clear predictions as to the timing of recovery 
within and between treatments can be made which can be further validated 
and extended by additional research data inputted into the database. 

25 To understand and explain, rather than just describe how treatments 

affect recovery as has been done, more detail about the 
pattern of recovery than previously described was sought. This meant to 
build upon the pattern of drug response that Quitkin et. al. 
(Quitkin 1 984; 1 993) described by following specific symptoms over 

30 time rather than a single indicator of global improvement. It also sought 
to connect the snapshots described by Katz (Katz:87) and show how they 
relate to outcome. To do this, a sample of patients who responded to 
treatment was selected, and then a set of quantitative rules which 
describe the evolution of symptoms during recovery was estimated. Thus 

35 the resultant model is able to predict the detailed pattern of recovery 
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from the pre-treatment symptoms. The fit of the model to the data is 
described in Section Qualitative Reasons for Choice of Second Order System. 
This work also extended the work of Hull et al. (Hull, 1 993) in that we had 
a larger sample of patients. In Hull, each symptom was modeled 
5 independently as an ARIMA process, not allowing for interactions among 
symptoms. We allowed for interactions of symptoms, which enabled a 
more detailed analysis of the recovery sequence. 

METHODS 

10 

Models of Patient Group Response Over Time 

Much of individual pattern of recovery appears predictable from the 
subjects initial data even though there is considerable idiosyncratic 
variation from subject to subject. In order to capture the maximum 

1 5 amount of individual variation within a treatment group as possible, 
and to compare the differing responses across groups, the problem was 
defined: Are there any differences in how symptoms improve in depressed 
patients who respond to cognitive behavioral therapy vs. Those who respond 
to desipramine? The approach taken was to recast the problem as a dynamic 

20 system. Recovery patterns for patients were modeled using differential 
equations, wherein the differential equation parameters were specific to a 
treatment group. A comparison of the features of recovery patterns was 
made to examine latency of response to treatment. A determination of 
which symptoms were the first to respond to treatment was made. Further, 

25 whether or not the symptoms affect each other was evaluated. Then, 

statistical analysis was applied to determine the significant differences in 
the model predicted recovery pattern features found in the different 
treatment groups. 

To accomplished this, an architecture or network of connections 

30 among variables corresponding to symptoms and the treatment input was 
constructed. Then two separate types of models of this architecture, 
namely a shunting model and a second-order model, so named because of 
the kind of differential equations that define the model, were constructed. 
Then, for each of these two types of models, the data were used to 

35 estimate a different set of parameters for each treatment group, 
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DMI and CBT. Thus, parameters were estimated for four separate models 
(two treatment groups by two model types). The parameters were 
estimated iteratively by cycling through the individual data within 
the treatment group as shown in Figure 1 which describes the training 
5 cycle for each treatment group. Referring to Figure 1 , for each of the two 
different models, the same architecture but different separate parameter 
sets were provided. Each model was trained by cycling through individual 
data within the respective treatment group. After each cycle, the cost 
function which reflects the degree of fit of the model predictions to the 

1 0 actual data was evaluated to determine the completion of training. 

Finally, we analyzed the parameters and behavior of the trained models 
when initialized with individual patient's baseline data values. In 
this way, the reliability of the predicted behavior within and across 
treatment groups was quantified.. 

1 5 Each model was fit to the seven constructed symptom factors 

derived from the Hamilton Depression Rating Scale. Three primary 
characteristics of the response pattern were studied: (1) direct effects 
(from treatment to symptoms);(2) interaction effects (between pairs of 
symptoms, which are indirect because they are not directly caused by 

20 the treatment); and (3)latency, which is the average time that elapses, 
from the start of the treatment to a 50 percent improvement in the 
symptoms. 

Each model was designed so that its output could be easily related to 
the evolving symptom factor values. To accomplish this, the network 

25 architecture was specified to have one variable for each of the seven 
symptom factors under study. The direct effects of treatment and the , 
interactions among symptoms were represented as modifiable connections 
from treatment to symptom factor variables and between the symptom 
factor variables. In addition to the above, a latency variable was 

30 introduced to represent varying symptom response time (the time it 
takes for symptoms to respond to treatment). 

Differential equations were used to describe the dynamics of the 
model. Two systems of differential equations were studied. One was a 
second order linear system, the other was a shunting system 

35 (Grossberg:82c), based on a first order non-linear differential equations. 
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After the architecture for the model was constructed, parameters 
were estimated using the learning algorithm described in Section 
Training Procedure which was adopted from optimal control theory. The 
optimized models are compared for goodness of fit. The parametric 
5 differences in latency, treatment effects (both immediate and delayed), 
and interactions between symptoms are discussed. 

Patient Data 

Weekly patient data were linearly interpolated to yield daily data for 
1 0 training. Data were converted to z-scores as follows according to Equation 
1. 



J! where Q 's are daily training data, sigma is the 

1 standard deviation, O is the overall sample mean and sigma 

% 20 is the overall sample variance. The difference from each day to the 

H next day was used as the training data for the first derivative of 

S each day. For the last day, the first derivative was assumed to be the 

C same as that of the previous day. 

5 Based on the premise that the symptoms are at equilibrium before 

m 25 the onset of treatment, seven days of data were added before the beginning 
of treatment. The training data for these added data (week -1 to week 
0), were set to the pre-treatment (baseline) values. For this period, 
training data for derivatives were set to zero. 

Data from five weeks were used in the calculation of the F statistics 
30 because the first week was used as the initial value. 

In addition to linear interpolation, splining by third order 
polynomial was also considered. It was not adopted because it tended 
to create artifacts that manifested as large curvatures around 
endpoints that potentially would distort the fit. 



35 
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Assumptions of the Model Design 

Several assumptions were made to highlight behavioral aspects of 
the effect of different treatments on and among the symptoms of 
depression. These assumptions apply to both the first and second 
5 order models. 

Treatment Effect 

The first assumption was that treatments act directly on 
symptoms, possibly by affecting neuromodulatory pathways acting on brain 

1 0 regions that control the behavior manifested in the symptom. In both 
models, this effect corresponds to the direct effect weights, i.e. the 
strengths of the response in the pathway from symptoms to treatment. 
Other possible causes, such as spontaneous recovery, sporadic 
fluctuations of symptoms, life events, and anticipatory anxiety about 

1 5 treatment termination, were not considered. Note that for both 
models, the symptoms tended to converge to baseline levels which 
represented pre-treatment symptom scores rather than non-depressed 
normal levels in the absence of treatment. Spontaneous recovery, 
i.e., recovery that may be due to lifestyle changes, supportive 

20 environment, or other uncontrolled life events were not considered 
for this model. 

Latency 

The second assumption was that there are two components of the 
25 direct treatment effect described above. One component acts directly on 
the symptoms, referred to as immediate and the other reflects 
underlying processes that cause a delay in the response, referred to 
as delayed or latent. Latency was included in the model 
because it has been observed in antidepressant drug response 
30 {Quitkin:84,Quitkin:87} and was an open question for CBT 

response. Latency is modeled by a parameter of the transfer function 
of an idealized node. This node transforms elapsed time (linear) into 
an overall latent effect (nonlinear). The latency is assumed to be 
the same across all factors. The latency determines the time when the 
35 level of input (which linearly increases with the treatment duration) 
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which results in half of the maximum possible output. 
Interactions 

The third assumption was that symptoms affect other symptoms, 
5 possibly through interconnections among regions such as 

transcortical connections, and through environmental and metabolic 
feedback in response to the behavioral changes. This effect is modeled by 
the coefficients (weights) of the links among symptom nodes. 

1 0 Network Architecture 

An overview of the architecture for both recovery models is shown in 
Figure 3-3. It is independent of the treatment data and was used as the 
architecture for both first and second order systems on the CBT and DMI 
data. The intensity of each symptom (it's HDRS score) is represented by 

1 5 network nodes which are shown as ellipses and are generally referenced as 
300. These correspond to the activities levels of the nodes (x /) in the 
system of differential equations, which describes the behavior of 
the network shown in Figure 3-4 and discussed below. Treatment direct 
effects and interactions among symptom correspond to weighted 

20 connections (arrows, 320) in Figure 3-3. The bi-directional arrows 310 in 
Figure 3-3 represent two separate weighted connections. The overall 
latency of the response to treatment corresponds to the parameter (A t ) 
of the delay node transfer function (the rectangle 330 labeled At ). 

Looking now at Figure 3-4, an annotated second order differential 

25 equation used to model the pattern of recovery is illustrated. The 

acceleration of symptom is equal to the summation of a stabilizing factor 
times the rate of symptom change plus the summation of the interactions 
between symptoms and the treatment effects, both immediate effects 
which are represented by a step function and delayed effects which are 

30 represented by a sigmoid function. The connection weights (coefficients in 
the equations) in the architecture represent the strength of the direct 
treatment intervention effects ( u / for immediate effect, v / for latent 
effect) and the strength of the interactions between pairs of symptoms (w 
h). As in Figure 3-3, the overall latency of the response to treatment 
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corresponds to the parameter (At ) of the delay node transfer function which 
in turn corresponds to the delay node function h ( d, t - A t ) in Figure 3-4. 

Treatment Effect 

5 The direct effects of the treatment on the symptoms are called the 

treatment effects. The intensity of the effect corresponds to the 
value of the coefficient of the link from a treatment to a symptom 
factor. A direct effect is inferred for symptoms whose recovery is 
strongly effected by the treatment intervention. 

10 In the second order model, it is assumed that the immediate direct 

effect of treatment, represented by a step function, correlates 
linearly with the acceleration (either by an increase or reduction) of 
factors through immediate treatment effect coefficients. It is also 
assumed that the latent direct effect of treatment, represented by a 

1 5 sigmoid function of time, correlates linearly with the acceleration of 
the factors through the latent treatment effect coefficients. 

Latency 

Referring now to Figure 3-5 modeling of latency is illustrated. The 

20 direct effects of treatment are either immediate 510 (step 

function) or delayed 520 (sigmoid function). Delays are estimated by 
treatment from the patient data, using an optimization procedure. 

Clinically, latency is defined as the response time of a symptom to a 
treatment. For example, it is well established that antidepressant 

25 drug treatments can take up to 4 weeks before the patient responds. 
In the recovery model, latency (At ) is defined as the time 
from the beginning of treatment to when the effect of the treatment 
achieved half of its full accumulated effect on the symptom. 

The recovery model's direct effects can occur through two treatment 

30 pathways: one with latency, and one without latency. To separate and 

thus capture the immediate and the delayed effects of treatment, two nodes 

were added and trained on the data. As shown in Figure 3-5, 

the pathway with latency is represented by a delay node 

that is a sigmoid function with two parameters: delay and steepness of 

35 the onset of the delayed effect. The pathway without latency is 
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represented by a step function fixed to coincide with the onset of the 
treatment. (Note: The simulated time-course begins one week before 
the onset of treatment.) All parameters were estimated by a training 
algorithm. 
5 Interactions 

Symptoms may affect each other. For example, increased energy 
may increase productivity at work. This effect is modeled by a link from 
a source symptom to a target symptom as is illustrated in Figure 3-6, 
recovery model detail. Direct effects and interactions in the recovery 
10 model wherein u-, represents the strength of the immediate effect of 
treatment on symptom node i, v, represents the strength of the delayed 
effect of treatment on symptom /; and w i} and wy t represent the interaction 
between the symptoms: the strength of the effect of symptom i on 

O symptom j and the strength of the effect of symptom; on symptom/, 

g 15 respectively. 

X The second order model assumes that a source symptom's deviation 

from intensity correlates with the acceleration of target symptoms 
^ through interaction coefficients. 

f 20 Derived Measures 

t Accumulated Interaction Strength 

'§ The calculation for the accumulated interaction strength of 

2 symptoms utilizes the fact that the symptom factors were normalized by 
» 25 shifting and scaling the data to have mean values equal to 0.0 and variance 
values equal to 1 .0, and that the maximum values of the step function 
and sigmoid function are 1 .0. This measure is a rough 
approximation valid for the center of the range for the second order 
model. Measures of interactions among symptoms were derived from the 
30 second order equation 3.6 by ignoring indirectly 

propagated influence (for instance, influence of factor/ on factor 
/ via factor^). Variables and parameters that appear in these 
equations are defined in Table 3.1 
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~ -WijBj (3.2) 



U t = J j { w *ii x i- B i) + Uis(t) + Vih{t)}d 2 t 

- -wuB t - Ui - Vi{l - ~f (3.3) 



where T is the entire treatment period (six weeks), 
10 W jj is the measure of total influence of symptom factor/ on 
symptom factor/, when x, is small, and U/ is the measure of 
total influence of the treatment intervention on the symptom factor 
I when x/is small. 

1 5 Latency of Each Factor: Half Reduction Time 

To compare the patterns of response to treatments we needed to 
construct the temporal structure of a patient's response. This meant 
that we needed a way to determine when each symptom responded to 
treatment. Based on the optimized model's prediction of a symptom's 

20 response trajectory, a measurement was made of the time it takes for 
the modeled symptom's intensity to decrease halfway from its initial 
intensity to its intensity after six weeks of treatment. This 
measurement is called the half reduction time (hrt). The hrt value 
is a prediction by the model after it has been trained 

25 on patient data, initialized with the baseline symptom values of a 
single patient, and allowed to evolve in accordance with the 
parameterized differential equations. 

The half reduction time (response time) of a symptom i (hrtjP) for 
a given patient P is formally defined, when it exists, as follows: 

30 

hrt? = {k\(k € Bi) & WW E B{ —¥ k < k')} (3.4) 
Bi = {t\( Xi (0) > Xi (T)) & (*,■(*) < *'(0)+*.-Cn )} (3 5) 



EI857146530US 



21 



5 where x-, (t) is the predicted symptom factor value of a 
patient on the tth day after the beginning of treatment, and T 
is the end of the 6 weeks of treatment (thus T = 6*7 = 42). This 
represents the shortest time by which a symptom has fallen to the 
average of its beginning and final value. Predicted symptom patterns 
10 that did not decrease were excluded from the calculation of the half 
reduction time mean. 

Range Score: Temporal Duration of Treatment Response 
o In addition to the response time, we were interested in examining the 

£ 1 5 temporal duration of the response. To address this aspect of the 
J recovery pattern, we constructed the range score, defined as the time 
m [number of days] between the day the first symptom reaches its half 
n reduction time to the day the last symptom reaches its half reduction time. 
f % This score is based upon the half reduction times predicted by the model. 
T 20 For individual patient trajectories, it is possible for the model to 

O predict that some symptom will simply not improve and therefore, the 
& half reduction time is not defined. The model did so in four (DMI) to 
p: five (CBT) cases out of a total of 42 possible half reduction 
S times. To fill in missing data, two approaches were considered. One 
GO 25 approach omits the patient's data for that symptom from the analysis, 

the other approach replaces the missing value with a hypothetical 

minimum or maximum depending on what occurred in the actual data of 

that patient. 

The two approaches for filling in data where the half reduction time 
30 was undefined are: 

1 . If the symptom was not present (and therefore could not improve) 
then use the value zero days for the half reduction time for that 
symptom. 

2. If the symptom was present, and either stayed at the same level 

35 throughout the six weeks or worsened, then use the value equivalent to 
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the maximum possible value, i.e. 42 days (six weeks) for the half 
reduction time for that symptom. 

The more conservative approach, which omits the symptom from the 
calculation was adopted. This approach is more conservative because 
5 it reduces the number of data points available for statistical 
analyses and thus makes it harder to achieve statistical significance. 
For this measure in particular, the omission of a symptom, depending 
on the symptom, can have a large impact on the range score. Thus, if 
the symptom is one whose mean reduction time is on one of the extremes 

1 0 (either very short or very long) then its omission will shorten the 
range score for that symptom and make it harder to show significant 
differences in the response patterns of different treatments. 

Most of the statistical tests and discussion are based on derived 
measures, in particular, the model-dependent half reduction times. 

1 5 There are two reasons for this approach. First, Tables 

3.3 and 3.4 show the fits of the model to the data are highly significant. 
The highly significant results suggest that the model captures aspects of 
the data and it is therefore appropriate to study the model's behavior. 
Subsequent section "Use of the Predicted Half Reduction Time Derived 

20 Measure" shows that when predicted and actual half 

reduction times both are defined, they are highly correlated. Section 
"Results: Statistical Inferences" is devoted to the elucidation of the 
differing patterns that resulted from training on two different 
treatment groups. There, the computed half reduction times were used 

25 to quantify the results obtained from model predictions based on 
individual initial conditions. 

Models Considered 

30 First and Second Order Systems 

One of the two classes of models used in this study was a system of 
linear second order differential equations. The second order model is 
presented in detail because it was the model ultimately chosen. The 
equations can be understood by their analogy to equations familiar from 

35 kinematics. Variables x / , x / , and x / can be thought of as 
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acceleration, velocity, and displacement, respectively. Each symptom 
of each patient is assigned a baseline valueE / , reflecting its 
pre-treatment value. A deviation of intensity of a symptom value x ■, 
from its baseline value B /, gives rise to two kinds of 
5 forces. The restoration force, a product of the deviation and the 
coefficient w /,, tends to return the symptom to its baseline 
value B /, (and therefore is w \ s constrained to be negative). 
An interaction force, a product of the deviation and w ti links 
the strength of the symptom to the acceleration of other symptoms and 
1 0 thereby causes the other symptoms to covary. (The sign of coefficient 
indicates whether improvement in the symptom will improve or impede 
improvement in another covarying symptom.) 



15 



20 



Second Order Model System 



N 

Xi = -Aiii + Yl( x j ~ Bj)wij + s{t)ui + A(q, t - bd)vi (3.6) 

3=1 



s(t) = 



0 t < 0 

(3.7) 

1 otherwise 



25 _ h(a,i-At) = 1 + e ! a{t _ At) (3.8) 



The meaning of each term in equation 3.6 is labeled in 
Figure 3.4 and Table 3.1. The value of variables x /, where i=1,2,..., N and 
N =7 is the number of symptoms under study, the predicted HDRS score 
of symptom / . Parameters are defined as follows: A / is a damping 
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First Order (Shunting) Model 

P 

Another model class that was explored in the current research was a first order 

shunting model (Grossberg, 198f) of the following fnrm: 

t 

f 

N 

^ = -Ai(xi - Di) + (B t - x t )(E w h x i + u ts(t) + v?h(a, t - At)) 

N 

~{Ci + Xi)(J2 u\ } Xj + u~s(t) + v~h(a, t - At)) ( 3>9 ) 

where A { is a decay constant, B { is an upper limit of a factor, C { is a lower limit, A is 
a baseline, toj is an excitatory interaction coefficient, u£ is an inhibitory interaction 
coefficient, uf is an excitatory immediate direct effect coefficient, u~ is an inhibitory 
immediate direct effect coefficient, is an excitatory latent direct effect coefficient, 
and is an inhibitory latent direct effect coefficient, a is the steepness of latent 
onset of treatment effect, and A, is the latency for the treatment effect. 

In a clinical sense, A { corresponds to the quickness of the symptom to go to the 
baseline value if effects of treatments and other symptoms were removed. and C, 
correspond to upper and lower limits of the symptom, in the sense that when the 
symptom value approaches to one of these limits the change slows down. u>J- is the 
interaction coefficient between symptoms when a high value of symptom tends to 

coincide with an increase of i, and a low value of symptom j tends to coincide with 
a decrease of i. iy~- is the interaction coefficient between symptoms when the sign of 
correlation is the opposite. Thus, at most one of wfj and tu~- is non-zero for a given 
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Training Procedure 

The parameters which yield good fits to the data were obtained 
through learning. This section describes processes and data that were 
5 involved in learning. Referring now to Figure 3-7 which provides a flow 
chart of the training process, parameters were initialized with regression 
matrix which was calculated from actual symptom values (ASV) by 
correlation and regression analyses. The model used these initial 
parameters to predict symptom values (model symptom values, MSV) of 

10 each patient from baseline. The optimization process iteratively 
modified parameters to minimize the discrepancy between MSV and ASV. 

MSV are daily symptom factor values starting from one week prior 
to the onset of treatment, whereas ASV are weekly data starting from the 
onset of treatment. Prior to the optimization process, ASV were 

1 5 transformed into the same format as MSV. This was done by extending 
the ASV by one week (from week 0 to week -1 ). It was assumed that the 
symptom factor values before the beginning of treatment were constant 
and equal to the baseline. A linear interpolation was used to extend 
the data. The extension was necessary because the model had to learn 

20 from the data the premise that the symptom factor values stay constant 
without treatment. The reason the data was interpolated to be daily 
rather than weekly was that the theories of differential equations and 
optimal control are continuous, and thus require finer time resolution 
than was available in weekly data from a six week study. 

25 The learning (training) algorithm was adopted from optimal control 

theory {BryHo:75} and is described in detail for the second order 
model only. Parameters for the Shunting Model may be found in U.S. 
Provisional Application S.N. 60/041,287 filed on March 20, 1997, the 
disclosure of which is incorporated herein by reference. 

30 The goal of the training procedure is to find the best model 

parameters. The method is to reduce the discrepancy between the 
prediction of the model and the actual data. To do this, model 
parameters are incrementally changed so that the discrepancy between 
the actual and simulated time series is gradually decreased. The 

35 discrepancy L (Figure 3.8), also called the Lagrangian, was defined as an 
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integral of the squared difference between the predicted and actual 
symptom values through time. Later the Lagrange multiplier u which 
represents a constraint that the differential equation must hold is 
introduced, and will serve to simplify calculation of the gradient. 
5 Referring now to Figure 3-8, a schematic description of the cost 

function L is illustrated. The formula inside the integration has two 
terms 810 and 820 respectively. By minimizing the first term 810, 
discrepancies between estimated and actual patterns of recovery are 
minimized. The second term 820 is a constraint term which states that the 
10 differential equation must hold. 

Estimating initial values of model parameters 

Vector auto regression analysis was applied to obtain the initial 
1 5 estimates of the model's parameters. The coefficient matrix in the 
first order differential equation 3.1 3 is analogous to an auto 
regression matrix when the equation is approximated by a difference 
equation. Therefore, a first order regression matrix was computed, and a 
part of the matrix was used to calculate initial values of the parameters of 
20 differential equations. 

Second order equations for* (Equation3.6) were first 
decomposed into a set of first order differential equations. 



Vi = -A i y i + Y,^-B j ) + n i s(t) + v i k(t) (3.11). 
j 

30 

Or, in a matrix form, 

whereP is the set of parameters in this equation ( wy, A i f B i t u i, vj). 
Initial values of the parameters were estimated using regression 
analysis, and then optimized through a training procedure {BryHo:75}. 
35 The auto regression matrix was calculated for a vector $[X_i]$ which 
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includes the symptom variables x /, their derivatives y / , and 
immediate intervention effect s(t). 



X' = lx l ...x n s(t)y 1 ...y n s'(t)} T (3.14) 



10 In this initial estimation process, the immediate intervention effect 
from s(t) was treated as another variable, and the latent 
intervention effect from h(t) was ignored. Although s A (0) is 
undefined, it is assumed to be 1.0, the difference of s (0) - 
s (-1 ). With these preparations, the calculation of the auto 

1 5 regression matrix and extraction of the initial parameters of the 
differential equations from the auto regression matrix were carried 
out as follows: 

Step 1 : Compute a regression matrix 
20 Covariance matrices ofX with two different time intervals 

Lambda(l) and Lambda(2) were calculated, the results of which 
were used to calculate an auto regression matrix. First order 
regression on a time series vector X' ( t ) was defined as 
follows. 

25 
30 

where Phii is the first order regression matrix, and r ( t ) is a 
disturbance (white noise) vector. Phii is calculated from 
correlation matrices. 
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where Lambda(fc) is theka, covariance matrix. A covariance matrix is 
calculated from the actual time series data X A ', as an average of 
1 0 covariance over time t and over patients. 



{U Step 2: Compute the Transition Matrix 

'% This step estimates P ', a transition matrix of symptoms 

m 25 from which initial parameters will be extracted. The transition matrix is 

a parameter in the state space difference equation that approximates 

the differential equation. 



i is 



Derivatives y -, mX' are approximated by the first 
difference*, - x/. 7. The unit of time is weeks because the 
20 HDRS symptom measurements were obtained weekly 



30 
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P ' is calculated based on Phh, the auto-regression matrix calculated in 
Step 1 . From equations 3.1 5 and 3.1 8, 

X'{t + 1)-X'(t) = P'X'{t) (3.19) 

X'{t + l) = X'(t) + P'X'{t) (3.20) 

= (I + P')X'{t) (3.21) 

P' = $!-/ (3.22) 

where/ is an identity matrix. 



Step 3: Extract Initial Parameter 

An examination of the inner structure of P in equation 
3.1 3 showed that it was appropriate to initialize 
the parameters as follows. 



4> = \-Pk, ( 3 - 23 ) 

w?. = P'-, ( 3 - 24 ) 

W lj * t'j 

7 ,o _ p/ (3.25) 

Ui - *V N+ i v } 



where { , = i + n = i + N + 1. N is the number of symptom factors and N + 1 is the 
index corresponding to the intervention variable s{t). 
The following parameters were initialized to constants. 

Bf = 0 (3-26) 

0 = tmax (327) 

2 

&t° = 0 (3-28) 

v° = 0 (3-29) 



Optimization 

The goal of the optimization process was to find the best parameters, i.e. those 
parameter values that yield the best fit to the data. This was accomplished through 
minimizing L, the squared error integrated over time. Each term is described in 
Figure 3-8. 

L{P) = \lo m ° {t) ~ X< < P ^ dt + ( 3 " 3 °) 

where 



0 if i^j 

1 if i = j<N 
A if i = j > N 



(3.31) 
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Where i' = i+n = l+N +1 . N is the number of symptom 

factors and A/+1 is the index corresponding to the intervention variable s(t). 



10 



15 



= 0 
a 0 = - 
At 0 



0 
0 



(3.26) 
(3.27) 
(3.28) 
(3.29) 



Optimization 

The goal of the optimization process was to find the best parameters, 
i.e. those parameter values that yield the best fit to the data. 
This was accomplished through minimizing L , the squared error 
integrated over time. Each term is described in Figure 3-8. 



20 



L[P) = \So m{0[t) - x ( p >*M 2 dt + h<\\P\f 



where 



(3.30) 



25 



0 if i^j 

1 if i = j<N 
A if i = j>N 



(3.31) 



30 



35 
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10 Where P is the set of parameters, O(t) is training data 
(pre-processed as described above in Patient Data and 
X(P,t) is the value of system equation 3.1 1 at time t. 
Diagonal elements of R (equation 3.31) determined the 
relative importance of minimizing the error (equation 3.30) 

1 5 for each element in X If lambda = 0 then the optimization is 
insensitive to errors in the derivatives. If lambda = 1 then the 
optimization evaluates, with the same importance, the errors in the 
derivative and the errors in the variables. When our objective was to 
compare the shunting and second order systems, we ran the simulations 

20 with lambda set to zero so that the same error function would be 
used for the comparison (see Table 3.3). The term 
KIIPII2 is used to keep the magnitude of the parameters from being 
large. X is the concatenated vector of factors*, and their 
derivatives y t . It is similar to X' except it does not 

25 include treatment intervention variables. The value ofK was chosen 
empirically (Optimization Procedure). 

X = [xi... x n , yi...y n ] T (3.32) 

30 

Integration was carried out using fourth order Runge-Kutta method 
with a time step of 1 [day]. Because the initial data were weekly, the data 
35 were linearly interpolated to daily data to get the non-derivative 
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part of R. The derivative part of R was approximated by daily 
differences. 

The gradient descent technique requires partial derivatives ofi. 
with respect to the components ofP. To simplify the form of 
5 partial derivatives, a coefficient called the Lagrange multiplier 
(mu(t )) was introduced. The optimization process aims to minimize 
the quantity L, the Lagrangian. The term multiplying mu(t ) 
is defined to be zero, as is explained below. This allows the meaning 
of L(P) to remain unchanged from the error function, equation 3.30 
10 while allowing the form of L(P) to be amenable 

to the computation of the gradients with respect to parameters 
{BryHo:75}. 



1 5 HP) = \ [{\\RW) - x(P,t))\\ 2 + mm) - f(P,x(p,t)))}dt + ±K\\pf 

(3.33) 



20 



25 



35 



In this equation, f(P,X) is the right hand side of the original 
differential equation 3.13, satisfying 



X{P,t) = f(P,X(P,t),t). (3.34) 



Thus the term with mu(t ) in the cost function, equation 3.33 
30 is always zero at the local minima, and therefore mu(t ) can be 
determined arbitrarily to make the form of partial derivative simple, 
i.e. not explicitly dependent on the parameters. 



The partial derivative of L with respect to parameter P s i s 

zt tT ,dXi ,dXi dfi dX k , dfi^ Hj 



+KP, 



E18571 465301 •= jT E{(ik(* - 0,-) -X^*^)^}* + £X fli W dt - 34 



dt + A'P, 



= / 0 T EHMXi - Oi) - E - w)^>* + E 



Hi 



dXi 



dPj 



k ax* "'dp, 

Integration by parts was usea in tne derivation from the second to the 
third line above. 



(3.35) 



10 



Pi / 
Jo 



t dXj 

dPj 



dt = 



Hi 



dXA T rT . dXi 



dt 



X 15 



(3.36) 



20 



BecauseA/ is defined as an explicit function of P, , it is 
difficult to calculate ^Xi V~ which is 

contained in the first term of equation 3.35. The 
necessity to calculate ^Yj 



Vd?c 



25 



was eliminated by constraining the term multiplying it to be zero. This 
is accomplished by setting 



30 If it is assumed that 



(n = Ru(Xi - Oi) - E Hk^r 



A'(0) is given and therefore f^"| t _ 0 = 0 

m(T) = o 



(3.37) 
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5 

Then 

(3.38) 

15 From equations 3.35, 3.36, and 3.38 

20 Wr~L + ^ ( 3 ^) 

25 Thus we got a simpler form of gradient under the condition of 
satisfying equation 3.37. This condition is met by solving equation 
3.37 for muj. 

30 

Optimization Procedure 

The steps in the optimization procedure are as follows: 
35 (0) Get first patient's data. 



dXj 
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(1) Solve the differential equation 3.1 1 for symptom factors. 

(2) Solve the differential equation 3.37 for Lagrange 
multipliers. 

(3) Calculate the partial derivatives and update the parameters. 
5 (4) Calculate the cost function L given in equation 3.33. 

(5) Unless one of the following holds, Stop and terminate the optimization. 



The average of the absolute value of $L$ for the 4 most recent cycles 
decreased from the preceding 4 cycles by more than 0.01 %. 



Differential equations were solved using the fourth order Runge-Kutta 
method with a step size of 1 .0 [day]. 
The explanation for each step follows. 



Solve the differential equation for symptom factors 

To predict the time course of symptom factors, integrate [forward] 
equations 3.10 and3.11. The notation in equation 3.13 is changed here to 
separate the variable vector into non-derivativex / and derivative y/ parts. 



10 



Fewer than 300 cycles have been processed. 



(6) Get next patient's data. If there are no more patients, then 
start over with the first patient's data. Go to (1). 



15 



20 



25 




(3.40) 



30 



Solve the differential equation for Lagrange multipliers 

To solve for the Lagrange multipliers, integrate Equation 3.37 
backwards i.e. from t=T to t =0). 



(*,-o„.)-£^-£^ 



fly, = 



k d Xi V d *i 

(x { - O xi ) - VykxT 2 «W 

k OX i j 

{Xi - O xi ) -Y^VykWki 
k 

KVi - Oyi) ~ fi xi - + fJ, yi A { 



/*«(<) = Hxi{T) + F \-f/ xi (T - u))du 
MO = Vyi(T)+ F \-ii yi {T-u))du 

J o 
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Calculate partial derivatives and update parameters where Pj is a general 
term for the parameters A } , B jt w Ut u jf v jf alpha, and A t . The 
correspondence can be, for instance, P; = A h P 2 = A 2 , ...P n=A n ,P n+1 = 
B i , Pn+z = Bz and so on. Learning constant varepsilon was set to 0.0001 
and parameter magnitude constraint coefficient /Cwas set to 0.0001. 



APj = -e 





(3.45) 



AA j = 



= e 



(/ -HyjAjdt-KAj) 

Jo 

= e (-(Zl^^i " KBj)dt 
jo 



Aw 3 k = 



Ait, 



e 

Jo i dwjk 



(3.46) 



(3.47) 



Jo " a^jt ~ 

( / ~^yj{ x k ~ B k )dt - A'ttfj*) 

Jo 

rT 



dfx 



yt st - Kuj) 



= £ 



Au 3 = 



/ fi y js(t)dt — Kuj) 
Jo 



= e{ / fi yj v{t)dt - Kvj) 
Jo 



AO = £ 



<jfE*.^*-*«) 



= £(/ (t - At)e- a(t " A,) J 1 (o, AM) E/'yfi* " ^Q) 
A(At) = eiJ^Hy^dt-KAt) 

= e( / ae" Q(t - At ) (-1T 2 (q, A* , 0) V /wu.-df - At) 
Jo { 



(3.48) 



(3.49) 



(3.50) 



(3.51) 



(3.52) 
(3.53) 



where 



#(a,Ai,f) = Ma, Ai, 0 + 0.5 



(3.54) 
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1 + e -a(t-At) ( 3 - 55 ) 

5 Similar equations adapted from optimal control theory were used to 

find parameters for the shunting model shown in Luciano, U.S. Provisional 
Application S.N. 60/041 ,287 filed on March 20, 1 997, the disclosure of 
which is incorporated herein by reference. 

10 Results: Introduction and Rationale 

This section and the following three sections present the results of 
the optimization procedure on linearly interpolated weekly data that 
was used to estimate parameters of a single model for each treatment 
P group. Each patient's week zero data were used as the initial 

2 1 5 conditions for a patient-specific run of the treatment group 

0 parameterized model to see the patient-specific predicted evolution of 
^ the symptom factors. The symptom half reduction times predicted by 
si the group-parameterized, but individually-initialized runs were then 

U computed and the resulting numbers used in the Mann-Whitney analyses 

* 20 of these data. Unless otherwise noted, quantitative references to 

3 symptoms, symptom factors, or modeled symptom values (MSV) are 
yl references to model predictions and not original data. 

3 Below in "Quantitative Fit of Model to Data" shows the correlation 

1 25 between the model's predicted recovery patterns and the actual recovery 

patterns 

and justifies the relevance of the Mann-Whitney U tests presented in 
subsequent sections. "Results: Model Choice" presents the 
goodness of fit statistics that justified the choice of the linear 

30 second order system over the first order (shunting) system to model 
the recovery pattern. "Results: Statistical Inferences" presents 
the differences in the treatment models obtained through statistical 
analysis of the half reduction times predicted by the second order 
model. "Results: Parameter Choice" presents the parameters 

35 obtained for the two treatment models. These treatment group 
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optimized parameters capture the essence of the different 
characteristics found in the patterns of recovery for the two 
treatments. 

5 Quantitative Fit of Model to Data 

The second order model (discussed below) predicts aspects of the 
actual recovery patterns that it was not trained on, i.e., the 
correlation with the half reduction time. This is evidence that the 
10 model has captured some of the underlying dynamics of the individual 
symptoms. 

The statistic for the goodness of fit of the second order model to the 
data was presented above, in Table 3.3. The F-statistic reported values 
were meant to be rough indicators of the goodness of fit and were be 

15 taken with caution for the following three reasons: (1) the assumption of 
data independence is violated (because the target data were time series 
data and therefore not independent); (2) the data were not partitioned 
into disjoint training and test sets; and (3) about half of the raw data was 
eliminated because the half reduction time was not defined in the actual 

20 data or the model predicted symptom trajectory. As additional patient data 
is added to the model, the F-statistic values should gain value as indicators 
of goodness of fit, thus increasing the predictive value of the model. 

However, notwithstanding the statistical reliability questions raised 
by the violation of the assumptions, the level of significance 

25 obtained was high (p < 1X10-5) that it was enough to 

justify further study of the predicted recovery patterns. Below it is 
demonstrated that the model predictions for the value of symptom half 
reduction times, to which the model was blind during training (and is 
therefore an independent measure), is highly correlated with the half 

30 reduction times of the actual data. Therefore, in "Results: Statistics" 
the statistical study of half reduction times is provided. 

Use of the Predicted Half Reduction Time Derived Measure 

In "Results: Statistics" the half reduction time measure 
35 is used to quantify predicted aspects of the treatment dependent 
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models. This is justified for the following two reasons. First, the 
fit to the data of the second order model is highly significant (shown 
in Table 3.3 and discussed below). Second, the half 
reduction times computed from the model's predictions were regressed 
5 against the half reduction times computed from the raw data to 
determine the relationship between them and the results indicated that 
they were highly correlated overall (shown in Table 3.2. Furthermore, 
the CBT model predicted half reduction time values versus the actual data 
half reduction time values are highly significant. However, the model fit 

10 to the data is not as good for DMI (see Table 3.3). In this case, the 
correlation of a half-reduction time and significance for DMI (shown 
in Table 3.2) is not significant. This suggests that the comparisons of the 
half reduction times between CBT and DMI, and within the DMI group, may 
not be directly reflected in the raw data, however, this cannot be 

1 5 determined without further data from recovering patients. More data is 
needed because in the data utilized, there were many cases where the half 
reduction time was not defined either because a symptom was not present 
or did not improve in either the raw data or the model predictions. In 
these cases, the half reduction time could not be used in the calculation of 

20 the correlation reported in Table 3.2. 



Table 3.2: Predicted and Actual Symptom Half Reduction Time Statistics. Statistics were 
calculated between actual half reduction times from data linearly interpolated and model predicted 
half reduction time data, r is Pearson's correlation coefficient, r 2 is the proportion of variance, t is 
an Student's ^-statistic, and p is the probability for the null hypothesis to hold. 



Half Reduction Time Correlation Results 


Group 


N 




t 


P 


CBT + DMI 


44 


0.1955 


3.1943 


< 0.01 


CBT 


21 


0.5852 


5.1776 


< 0.001 


DMI 


23 


0.0515 


1.0681 


ns 
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Results: Model Choice 

Qualitative Reasons for Choice of Second Order System 

5 Referring now to Figure 3-9, predicted patterns of recovery produced 

using (a) the shunting and (b) the second order equations wherein the solid 
lines show actual patterns derived from patient clinical data and dotted 
lines show predicted patterns. Numbers shown at the vertical axis are 
scaled such that the possible maximum symptom factor value yields $1.0. 

1 0 The plot at the bottom right in both (a) and (b) shows the error L on the 
ordinate axis plotted against number of training cycles on the abscissa. 
Note that the absolute values of the error measure L cannot be compared 
between shunting and second order equations, because the latter 
includes errors in the derivatives of L As can be seen in Figure 3-9 (CBT 

15 patient 1840201 MOOD) oscillatory patterns were discovered in the data. 
Referring now to Figure 3-10 which illustrates plots of individual 
recovery patterns with time in response to either CBT treatment or DMI 
treatment. Plots show patterns of recovery by symptom such as for 
example al "anxiety", a2 "cognitions, a3 "mood", a4 "work", a5 "energy", 

20 a6 " early sleep", and a7 "middle and late sleep", monitored in four patients 
a, b, c, and d, except for a8, b8, c8, and d8 which show the error trend for 
each patient monitored. A solid line represents the actual pattern of 
recovery exhibited by a patient in response to treatment. A dotted line 
represents the predicted pattern of recovery. Numbers shown at the 

25 vertical axis are scaled such that possible maximum symptom factor value 
yields 1 .0. Plots a8, b8, c8, and d8 show the error L on the ordinate axis 
plotted against the number of training cycles shown on the abscissa. 

Plotting the patterns of recovery for individual patients who 
responded to either of two treatments (CBT or DMI) , it was discovered that 

30 some of the recovery patterns seem to have oscillatory components. In 
Figure 3-1 0 (DMI patient 1 81 01 01 MOOD), and (CBT patient 1 800201 
COGNITIONS), illustrate this. Oscillatory components can be captured 
naturally by second order or higher order equations. First order 
equations can model oscillations only by interactions among variables. 

35 Therefore, if there is an oscillation observed in one symptom factor 
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in a first order system, there has to be another symptom factor or 
some covert factor, oscillating at the same rate. This type of coupled 
oscillation was not observed in the overt factor data. 

Another observation was that a characteristic profile of activations 
5 over time in a first order shunting equation was 

an abrupt initial change that slowed as it approaches equilibrium, 
similar to an exponential decay. This was not commonly observed in 
clinical data. For example, mood and work factors in Figure 3-9 show 
exponential increase from pre-treatment to the start of treatment and 
10 exponential decrease after the start of treatment. 

These qualitative observations, which are later quantitatively 
confirmed, resulted in the choice of a second order system. Figure 3-1 0 
shows some examples of individual patient recovery patterns as 
D predicted by the model using the optimized parameters. A solid line 
2 1 5 corresponds to raw weekly data. A dashed line corresponds to a 
O prediction from the pre-treatment symptom factors and the optimized 
:j parameters. While these individual fits are rough, they captured the 
tj overall trends of the recovery patterns. 

W It can be seen from the graphs of patient data in Figure 3-1 0 

=P 20 that each individual's time course of respo nsediffered greatly from 

* n another's. This made it difficult to visually evaluate the 

y optimization process, simply by looking at the results of the 

fU parameter optimization on the individual data. As an aid in the 

c visual assessment of the optimization, and to ensure consistency, the 

J 25 optimization process was also performed on the mean of the six 

treatment responders for each treatment group. Figure 3-1 1 shows the 
time course illustrating the results of the optimization performed on CBT 
mean data and DM! mean data. The optimization on the mean data yielded 
correlation coefficients of 0.89 and 0.84 between the estimated 
30 mean symptom values and the mean data values in the CBT and DMI 
groups, respectively. 

Statistical Reasons for Choice of Second Order System 
35 The second order model provided a better fit to 
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the data. The number of data points fit were 252: 6 patients times 
7 symptoms times 6 weeks (6X7X6). Table 3.3 shows the 
F-statistics for the fits to the data for the two models. The second 
order model fit the data better than the first order (shunting) model 
5 for the CBT data, but the fit was roughly the same for the DMI data, 

with only a slight improvement with the second order model. The fits of the 
data were tested to determine if the fit was significantly better 
using the second order model by performing an R to Z transformation 
and then testing the difference in the z-scores obtained. The results of 

10 these tests are shown in Table 3.4, where it can be observed that in the 
case of the CBT data, the goodness of fit for the second order model was 
significantly better than the first order model. For the DMI data, 
the difference was not significant between the two models. 

It can also be seen from the table that the second order equations 

1 5 showed higher correlations for CBT and approximately the same 
correlation for DMI data. In a separately conducted simulation which 
used splined data (accomplished using the cubic spline interpolation of 
deBoor {deBoor:78}), correlation and Lo were higher for both 
treatments. Although not shown in the table, the pattern of 

20 statistical significance was the same. Specifically, the fit of the 
second order model was significantly better on the CBT data than on 
the DMI data, where there were no significant differences. 

The second order model provides a better description of the data for 
both qualitative and quantitative reasons, as discussed above. Detailed 

25 results of the second order system are presented. 

Table 3.3: F-statistic of First and Second Order Models. 
F-statistic results for first order and second order systems (79 
parameters). Statistics were calculated between actual 
30 data linearly interpolated and predicted data by the model, r is 
Pearson's correlation coefficient, r 2 is the proportion of 
variance, F is an F-statistic, andp is the probability for the 
null hypothesis to hold. For the calculation of the F-statistic, 
degrees of freedom were(A/ ? , N 2) = (252, 79) where N 7 is the 
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number of predicted weekly data and N 2 is the number of free 
parameters. L 0 is the sum of squares of difference between the 
actual data after linearly interpolation and the predicted data 
accumulated on a daily basis. For the simulations underlying these 
5 calculations, lambda was set to zero for the second order. This 
ignores errors in the first derivative allowing direct comparison of 
the two models 



10 



15 



System 


F 


P 




r 


Lo 


First Order 


3.05 


< 1 x 10~° 


0.530 


0.728 


27.0 


Second Order 


5.36 


< 1 x 10-° 


0.664 


0.815 


17.5 


] 


Desipramine 


System 


F 


P 


r 2 


r 


Lo 


First Order 


1.78 


0.00058 


0.397 


0.630 


27.2 


Second Order 


1.90 


0.00016 


0.412 


0.642 


24.7 



20 



Table 3.4: Result of R to Z transformation and comparison of 
significance of differences of the goodness of fit for the first order 
25 versus the second order systems. Subscript 1 indicates (shunting) 
first order system, subscript 2 indicates second order system, p 
is the significance as a normal deviate. 



Significance off Difference oi *'irst and Second ^J^** 8 
Treatment 



N 



T2 



*2 



CBT 
DMI 



252 

252 



0.728 
0.630 



0.815 
0.642 



0.924 
0.741 



1.124 
0.762 



0.217 

-0.020 



-2.424 
-0.225 



0.0152 
0.8218 
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Results: Statistical Inferences 

Timing of Symptom Improvement 

In this section, results are presented from the studies that address 
5 timing aspects of the response patterns as predicted by the treatment 
models. The timing aspects are based on the derived measure half 
reduction time. Table 3.5 gives the mean half reduction time for each 
symptom by treatment. Figure 3-12 provides graphs of these data. The half 
reduction times for symptoms subject to cognitive behavioral therapy 

10 (CBT) are shown in the upper portion of the figure and those for 

desipramine (DMI) are shown in the lower portion of the figure. The 
aspects that were studied were (1) a comparison of when symptoms 
were predicted to improve between the two treatments; (2) comparison of 
when symptoms were predicted improve relative to each other within a 

1 5 given treatment, and (3) a comparison of the temporal duration of the 
predicted symptom response times between the two treatments. 

Table 3.5 Reduction Time [weeks] statistics computed 
from patterns generated by the optimized model. 

20 



Symptom 


CBT 


DMI 


mean 


std 


N 


mean 


std 


N 


Mood 


1.37 


1.36 


5 


2.09 


0.57 


6 


Cognitions 


1.51 


1.37 


3 


3.54 


0.93 


5 


Work 


2.21 


0.49 


5 


2.67 


1.95 


5 


Anxiety 


2.57 


0.80 


6 


3.74 


3.24 


6 


Energy 


2.76 


1.06 


6 


2.10 


1.48 


6 


E Sleep 


4.63 


1.32 


3 


2.96 


1.82 


5 


M,L Sleep 


5.04 


1.33 


3 


3.89 


1.15 


2 
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Comparison of Response Times Between Treatments 

The response times of symptoms between the two treatments were 
compared. A Mann-Whitney U test on half reduction times of symptoms (as 
predicted by the model) was performed. The 

results presented in Table 3.6 indicate significant differences in the 
response times of the mood and cognitions (sad mood, thoughts of guilt or 
suicide, and anxious mood) between the two treatments. For these 
symptoms only, the half reduction times were shorter in the patients who 
responded to cognitive behavioral therapy (CBT) than they were for the 
patients who responded to desipramine (DMI). Furthermore, as shown 
in Table 3.6 in CBT the mood and cognitions (sad mood, 
thoughts of guilt or suicide, and anxious mood) were the first 
symptoms to respond. There was no significant difference in the 
response time of the overall (50\% decrease in) severity of the 
depressive episode for the two treatments (p =0.294). 

Table 3.6 Mann-Whitney U Tests (two-tailed) for significant 
difference in symptom half reduction times of predicted patient 
trajectories as derived from group parameterized models. The 
distribution was derived by running the same treatment group model 
from individual-specific conditions. Half reduction times for CBT and 
DMI are given in days. Significance values/? (two-tailed). 



25 



30 



Half reduction time [days] differences between mean CBT and DMI patients 


Symptom 


Mann- Whitney U 


Ni 




P 


DMI 


CBT 


Cognitions 


1 


5 


6 


.008 


25 


12 


Mood 


1 


5 


5 


.016 


15 


11 


Anxiety 


4 


6 


6 


.026 


21 


19 


Energy 


7 


6 


6 


.094 


16 


20 


Middle, Late Sleep 


4 


4 


5 


.190 


28 


36 


Early Sleep 


3 


4 


5 


.212 


22 


32 


Severity 


14 


6 


6 


.294 


22 


20 1 


Work 


12 


6 


6 


.394 


20 


16 



35 



i indicates that the mean was computed over all symptoms and over all patients. 
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These data indicate that Cognitive Behavioral Therapy acts first on 
mood and cognitions (sad mood, thoughts of guilt or suicide, and 
anxious mood). Moreover, this effect occurs significantly earlier in 
patients treated with CBT than in patients treated with DMI. This 
5 early response may be a result of interactions between the patient and 
therapist, whereby distorted cognitions, ways of thinking 
or interpreting events in the world, are identified, discussed, and 
treated. The hypothesis that desipramine may act directly and 
initially on the physiological factors energy/retardation is supported 
1 0 in the data by a trend (p <0.1 ). 

Sequence of Symptom Improvement Within A Treatment Group 

The sequence, or order in which symptoms improved, was determined 
by using the half reduction times that were computed for each symptom. 

1 5 The ascending order by CBT half reduction times for both CBT and DMI 
are given in Table3.5 and shown graphically in 
Figure 3-12}. From Figure 3-12} it can be seen 
that the order in which symptoms respond, i.e. the sequence of half 
reduction times are different between the two treatment groups. 

20 Significant differences in these sequences are presented in two parts. 
The first part (discussed above) shows that some symptoms (cognitions, 
mood, anxiety) improve significantly earlier in CBT than in DMI. The 
second part (discussed below) shows that within treatment groups there 
may be significant differences in the half reduction times of 

25 individual symptom factors. 

In patients who responded to CBT, the symptoms improved in the 
following order: Mood, Cognitions, Work, Anxiety, Energy, Early Sleep, 
and finally, Middle and Late Sleep. By comparison, in patients who 
responded to DMI, the order in which symptoms was: Mood, Energy, Work, 

30 Early Sleep, Cognitions, Anxiety, Middle and Late Sleep. In both 

treatment patient groups, Mood was the first symptom to improve and 
Middle and Late Sleep was the last. The initial improvement in Mood may be 
due to a non-specific treatment effect, perhaps resulting from the 
patient participating in a research study, which could have given rise 

35 to a more hopeful outlook. 
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25 



Mean Half Reduction Time 
Treatment Group 



CBT 




2EL 



DMI 



35.04 




Mood ■ Cognitive E Work ■ Anxiety 
Energy ED E Sleep □ M,L Sleep 



Figure 3-12: The model's predicted values of latency for individual 
symptom factors. Note that in the DMI patients, the model predicts that 

30 the first symptom latency is 2.09 weeks (mood) and last symptom latency 
is 3.89 (middle and late sleep), thus the range for all symptoms to respond 
1 .8 weeks. In CBT patients, the model predicts that the first symptom 
latency is 1 .37 weeks (mood) and last symptom improvement latency is 
5.04 (middle and late sleep), thus the range for all symptoms to 

35 respond is 3.7 weeks of the six weeks studied. 
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Overlap of Symptom Improvement 

Referring now to Figure 3-12, predicted mean half reduction time in 
weeks for seven symptoms in response to two treatments ((CBT = a) and 
(DMI = b)) are shown graphically. Mean half response time is shown for mood 
5 1 21 0 a, b; cognitive symptoms 1 220 a, b; work 1 230 a, b; anxiety 1 240 a, b; 
energy 1 250 a, b; early sleep 1 260 a, b; and middle to late sleep 1 270 a, b 
for each treatment respectively. The numbers at the end of each bar 
indicate the time in weeks predicted to be required to observe a mean half 
reduction time. 

10 The time sequence of symptom improvement was studied, in order 

to understand whether the symptoms improved at the same time 
(concurrently) or one after another (sequentially). The mean half 
reduction time for each symptom (Table 3.5 and Figure 3-12) is the time 
from the beginning of treatment until the symptom decreases to half its 

1 5 initial value. This was used to compare the order of symptom 

improvement between and within each treatment group (Table 3.7 (CBT) 
and Table 3.8 (DMI)). 

Statistics were calculated for both the CBT and DMI groups separately. 
Symptom data that were not predicted to improve over the initial six week 

20 treatment period were omitted, as indicated by the fact that the number of 
data points N are less than the number of responders (6) in Table 3.5. 
Results are schematically shown in Figure 3-13. 

Results presented in Tables 3.7 and 3.8, and depicted in Figure 3-13 
are conservative. To determine the sequence of recovery, symptoms were 

25 first ordered by latency and then examined for significant differences in 
latency between each symptom and its nearest neighbor. Where latency 
differed by p <0.05 a decrease was defined. In the CBT group, there is a 
significant difference (p=.052) between the half reduction time for the 
Energy symptom factor and the Early Sleep symptom factor, thus 

30 suggesting two distinct phases of symptom improvement. Moreover, there 
was also a trend (p=0.063) for another split between Cognitions (thoughts 
of guilt and suicide) and Work (work and interests). No significant 
differences were found between nearest neighbors in the DMI half reduction 
time sequence of symptom improvement, suggesting a concurrent 

35 improvement of symptom factors. 
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CBT Within Group Comparison of individual patient's half times. 



Symptom 


Anxiety 
n = 6 


Cognitions 
n = 6 


Mood 
n = 5 


Work 
n = 6 


Energy 
n = 6 


E Sleep 
n = 4 


ML Sleep 
n = 5 


Anxiety 

Ni = n, N2 = n 




C<A 
.082 


M<A 
.008 


.588 


.394 


.126 


A<MLS 
.020 


Cognitions 

Ni — n, N2 = n 






.310 


C<W 
.063 


C<E 
.082 


C<ES 
.016 


C<MLS 
.016 


Mood 

N\ = n f N2 = n 








M<W 
.004 


M<E 
.004 


M<ES 
.018 


M<MLS 
.016 


Work 

TVx = n, N2 = ti 










W<E 
.064 


W<ES 
.018 


W<MLS 
.010 


Energy 

iVi = n, N2 — n 












E<ES 
.052 


E<MLS 
.010 


E Sleep 

A r i = n T iV 2 = n 














.804 



15 

Table 3.7 Within group (CBT) comparison of individual patient's half reduction times as 
predicted by the model after training. Mann- Whitney $U$ Test was used to find significance values 
(p). P values reported are two tailed, (with the direction indicated in each case). If the model 
20 predicted non-improvement in the severity of a symptom, then the value 
was obtained by omitting these cases from the calculation and thereby 
reducingiV to n , the reduced number of cases. 



25 



30 



DMI Within Group Comparison of individual patient's half times. 


Symptom 


Anxiety 
n = 6 


Cognitions 
n = 6 


Mood 
n = 5 


Work 
n = 6 


Energy 
n = 6 


E Sleep Mb Sleep 
n - 4 1 n = 5 


Anxiety 

Ni = n, N2 = n 




.310 


M<A 
.008 


W<A 
.042 


E<A 
.004 


.172 


.930 


Cognitions 

N\ = n, N2 = n 






M<C 
.030 


.180 


E<C 
.042 


.114 


.930 


Mood 

Ni = n, iV 2 = n 








.330 


.792 


.190 


M<MLS 
.016 


Work 

Ni — n y N2 = n 










.394 


.762 


.126 


Energy 

N\ = n, A T 2 = n 












.352 


E<MLS 
.018 


E Sleep 

N\ — n, A r 2 = n 














.412 



Table 3.8 Within group (DM) comparison of individual patient's half reduction times as 
3 5 predicted by the model after training. Mann- Whitney U Test was used to find significance values 
(p). p values reported are two tailed, (with the direction indicated in each case). If the model 
predicted non-improvement in the severity of a symptom, then the value 
was obtained by omitting these cases from the calculation and thereby 
reducingiV to «, the reduced number of cases. 
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Figure 3-13 diagrams the time sequence of symptom 
improvement. The vertical axis shows the mean half reduction time in 
weeks; the horizontal axis has no meaning. Symptom names are enclosed 
by small white ellipses and placed vertically at their mean half 
5 reduction time. Symptom names are placed vertically at their mean half 
reduction time. Significant difference ($p<0.05$ after rounding) between 
half reduction times of energy and early sleep disturbance. There is a 
trend (p<0. 10 after rounding) for a split between cognitions and work in 
CBT responders. In DMI there were no significant differences (or trends) 

1 0 in the sequence. 

This result suggests that the order and timing in which symptoms 
improve, one aspect of the recovery pattern, is different for those 
patients who responded CBT from the order observed in those patients 
who responded to DMI. This could represent a different population, or 

1 5 it could represent a different method of successful therapy. 

The difference in recovery patterns between CBT and DMI reflect 
possible differences in the method of action of the different 
therapies. The two main differences are (1) it is harder to 
distinguish separate groupings for DMI than for CBT, arguing for 

20 concurrent effects in DMI and sequential effects in CBT, and (2) 

Improvement in the cognitive symptoms (guilt and suicide) and mood 
tended to drive the response in the patients who responded to CBT, 
whereas mood improvement, energy and psychomotor retardation tended to 
drive the response in the DMI responsive patients. This suggests 

25 different modes of action of the two treatments. 

DMI Symptom Half Reduction Times by Patient 

Table 3.9 shows the average and the individual patient's 
half reduction times for each symptom factor as predicted by the 
30 model. Note that n is the number of symptoms the model predicts 
will improve by six weeks of treatment. A x x — " indicates that the 
model predicts the symptom will not improve within the first six 
weeks. 
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Table 3.9: DMI Half reduction time [days] as predicted by the model. Number (« ) is the 
number of symptoms that the model predicts will improve over the six week course, and ~ — " 
indicates a symptom that the model predicts will not improve within the first six weeks of treatment 

5 



10 



Patient 


Anx 


Cog 


Mood 


Work 


Energy 


E Sle 


ML Sle 


Severity 


155 


23 


32 


21 


11 


2L 


16 


37 


23 


157 


25 


25 


13 


31 


15 




20 


18 


165 


33 


23 


17 


15 


17 




27 


21 


167 


34 


31 




22 


13 


33 


31 


31 


175 


23 


22 


12 


19 


14 


13 


21 


18 


181 


19 


12 


10 


14 


8 


21 




12 


Mean 


27 


25 


15 


20 


16 


22 


28 


20 1 


Number (n) 


6 


6 


5 


6 


6 


4 


5 


6 



mean computed over all symptoms and over all patients. 



15 

1 mean computed over all symptoms and over all patients. 

CBT Symptom Half Reduction Times by Patient 

Table 3.10 shows the average and individual patient's 
20 half reduction times for each symptom factor as predicted by the 
model. Note that $n$ is the number of symptoms the model predicts 
will improve by six weeks of treatment. A x % — " indicates that the 
model predicts the symptom will not improve within the first six 
weeks. 

25 

Table 3. 10: CBT Half reduction time [days] as predicted by the model. Number (n ) is the 
number of symptoms that the model predicts will improve over the six week course, and ~ — " 
indicates a symptom that the model predicts will not improve within the first six weeks of treatment 

30 



35 



Patient 


Anx 


Cog 


Mood 


Work 


Energy 


E Sle 


ML Sle 


Severity 


180 


19 


5 


10 


, 13 


21 


40 




18 


183 


18 


9 


10 


17 


20 


39 


42 


20 


184 


11 


22 




18 


16 


38 


35 


16 


191 


30 


7 


12 


19 


24 


28 


37 


25 


193 


18 




8 


12 


19 


17 


27 


15 


195 


12 


10 


8 


14 


16 






19 


Mean 


19 


12 


11 


16 


20 


32 


36 


20* 


Number (n) 


6 


5 


5 


6 


6 


5 


4 


6 



mean computed over all symptoms and over all patients. 
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Range Score: Temporal Duration of Treatment Response 

Table 3.1 1 shows the range scores for each patient 
in each studied. The range score for a patient is the interval 
[in days] between the half reduction time of the first symptom to 
5 improve and the half reduction time of the last symptom to improve. To 
determine whether the range scores were significantly different for the 
two treatment groups, a Mann-Whitney U test was performed. The test 
results are shown in Table 3.12 , and indicate that the range 
scores were not significantly different. 

1 0 Although these samples are very small there is supporting evidence to 

warrant further consideration. Recall that some symptoms were not 
predicted to have a half reduction time for some initial data. In 
those cases, the symptoms were omitted from the calculation. If 
however, instead of omitting the symptom, the missing value is 

1 5 substituted by the mean value over all responders from that study is 
substituted, the results were significant (p= 0.016). Because the 
sample is so small, we cannot tell whether or not the two-tailed 
significance value of 0.132 would be significant in larger sample size 
and thus show the range of response times to be significantly 

20 different. While the data do suggest at least two phases in the 
action of CBT and only one phase in the action of DMI, no further 
conclusions can be drawn at present with this sample. 

Table 3.11 CBT and DMI range scores for twelve patients who responded to CBT or DMI. 
2 5 Values given are the number of days between the first and last symptoms to reach their half 
reduction time as predicted by the model after training. Three range scores are given 
whose value differs only where the model predicted that a symptom would not improve. The first 
omits these cases from the range score and the second uses the mean. 



35 







Range 


Scores 






DMI 


CBT 


Patient 


Range 


Range 


Patient 


Range 


Range 




(omit) 


(mean) 




(omit) 


(mean) 


155 


26 


26 


180 


35 


35 


157 


12 


12 


183 


33 


33 


165 


18 


18 


184 


27 


27 


167 


21 


21 


191 


30 


30 


175 


11 


11 


193 


19 


19 


181 


13 


20 


195 


8 


28 
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Table 3. 12 CBT vs DMI range scores differences. The Mann-Whitney U Test for 

significant difference was applied to individual patient's range scores. Significance values 
($p$) are two tailed, (direction indicated in Response column). The difference in significance 
is because using the mean as a substitute for a missing half reduction 
5 value does not changeiV whereas the omission of those symptoms 
reducesiV. 



CBT vs DMI Range Score Comparison 


Mann-Whitney U 


Ni 


N 2 


P 


Response 


3 


6 


6 


.016 


DMI < CBT (mean) 


8 


6 


6 


.132 


DMI < CBT (omitted) 



15 

o Results: Parameter Choice 

'% This section presents the parameters obtained from the two 

E treatment models. These parameters reveal differences in the patterns 

y m 20 of recovery for the two treatments. Using the optimized parameters and 

% the pre-treatment symptom factors for each patient, differences in 
parametric choice are discussed. 

jjj Latency and Steepness Parameters 

□ 25 Latency and steepness (A t and alpha, respectively) were 

5 optimized over all symptoms over ail patients. Optimization of the 

k second order network's latency parameter A t indicated a 1 .2 

week latency for treatment with cognitive behavioral therapy (CBT) and 
a 3.4 week latency for treatment with the tricyclic antidepressant 
30 drug desipramine (DMI) as shown in Table 3.1 3. 

Steepness of onset of the delayed treatment effect (the parameter 
alpha in the sigmoid function) were very close to 3.0 for both 
CBT and DMI (Table 3.13). 

35 
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Table 3. 13 Latency [week] and steepness [week-i] of the latent effect of 
treatment as predicted by the model. 



Parameter 


CBT 


DMI 


Latency 
Steepness 


1.22 
3.00 


3.42 
3.01 



The result of the optimization of the model showed that latency 
parameter for CBT was very small (1.2 weeks), whereas the latency 

10 parameter for DMI was much larger, 3.4 weeks. This is consistent with 

the well established observation {Quitkin:84, Nierenberg:91, Quitkin:93} that 
anti-depressant drug treatments can take up to 4 weeks before they 
become effective. The goodness of fit was relatively insensitive to the 
steepness of the sigmoid function and there was little change from the 

1 5 initial choice of the parameter. 

Treatment Intervention Parameters 

The direct effects of CBT and DMI treatment interventions are shown 
in Tables 3.16 and 3.1 7, respectively. To see if the raw data suggests a 

20 significant difference in treatment effects between the two treatment 
groups, the improvement rates in severity (Table 3.15, ANOVA results, 
Table 3.14 , t -test results) after six weeks of treatment were compared. 
Although the rates were different for mean overall improvement in 
severity (39\% for CBT and 57\% for DMI), the difference was not 

25 statistically significant between the two treatment groups. 
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Table 3. 14 Symptom factor reduction rates after six weeks of treatment for raw data N 

number patients in which the symptom improved. If 

the symptom was not present, did not improve, or worsened, it was 

excluded from the calculations, sd = standard deviation. 





CBT 


DMI 


Significance level 


Symptom Factor 


mean 


sd 


N 


mean 


sd 


N 


of difference (p) 


Anxiety 


0.26 


0.46 


6 


0.46 


0.51 


6 


0.504228 


Cognitions 


0.15 


0.55 


5 


0.57 


0.34 


6 


0.151835 


Mood 


0.31 


0.51 


6 


0.47 


0.51 


5 


0.613145 


Work 


0.50 


0.50 


5 


0.75 


0.42 


6 


0.389283 


Energy 


0.40 


0.24 


6 


0.43 


0.50 


6 


0.885714 


E Sleep 


0.50 


0.71 


2 


0.67 


0.58 


3 


0.788780 


M,L Sleep 


0.44 


0.10 


3 


0.06 


0.65 


6 


0.349958 



Table 3. 15 Severity reduction rates after six weeks of treatment 
and results of ANOVA on raw data. 



CBT 




DMI 


Patient 


Reduction Rate 




Patient 


Reduction Rate 


# 


(week 6) 




# 


(week 6) 


180 


0.00 




155 


0.50 


183 


0.59 




157 


0.56 


184 


0.56 




165 


0.52 


191 


0.04 




167 


0.72 


193 


0.48 




175 


0.52 


195 


0.68 




181 


0.59 


mean 


0.39 




mean 


0.57 


sd 


0.30 




sd 


0.08 



Source 


SS 


df 


MS 


F(P) 


Treatments 


0.0963 


1 


0.09363 


1.998 (0.1878) 


Error 


0.4686 


10 


0.04686 


Total 


0.5622 


11 







EI857146530US 



57 



The rest of this section focuses on the differences in direct effects 
of treatment on symptoms observed in the optimized model parameters. 

The second order weight coefficients corresponding to immediate 
and delayed direct effects are shown in Figure 3-14. Immediate effects 
5 are presented at the left, delayed effects are presented at the right. In 
CBT, the delay itself is very small (1.2 weeks) whereas for DMI, the delay 
is much larger (3.4 weeks). 

There are two points that should be made. First, for CBT there is not 
much difference between direct and delayed effects on symptoms except 

10 for insomnia, whereas for DMI delayed effects are dominant for cognitions 
and mood. Moreover, delay for CBT is small (1.2 weeks) compared to that 
of DMI (3.4 weeks). This indicates that DMI works on cognition and mood at 
later time than CBT does. Second, effects of CBT are undifferentiated 
among symptoms except Insomnia. Even the difference between Insomnia 

1 5 and others disappears after 1 .2 weeks. In contrast, the immediate effect 
of DMI is greatest on Work, and the delayed effect of DMI is greatest on 
Cognitions and Mood. A zero indicates that the model predicted the symptom 
would worsen initially. 

Referring now to Figure 3.14 which provides a graphical comparison 

20 of model's predicted (a) immediate and (b) delayed (latent) direct effects 
of treatment on symptoms for Cognitive Behavioral Therapy and 
Desipramine. A solid line represents CBT coefficient values, a dashed 
line represents DMI coefficient values. Symptom are represented along 
the x-axis. The coefficient values the parameter optimization 

25 procedure indicate the strength of the effect on the symptom at the 
time the effect takes place, and are placed on the y-axis. For 
example, the delayed effect of cognitions for desipramine occurs at 
3.4 weeks with a magnitude of almost 1 .5, whereas the delayed effect 
of cognitions of CBT takes place at 1 .2 weeks and has a magnitude of 

30 about 4.2. A zero indicates that the model predicted the symptom 
would worsen initially. 



35 
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Table 3. 16: Coefficients of immediate and latent effects from treatment to symptoms (CBT). 



10 



Symptom Factor 


Immediate 


Latent | 


Anxiety 


-0.396 


-0.428 


Cognitions 


-0.374 


-0.424 


Mood 


-0.480 


-0.171 


Work 


-0.538 


-0.406 


Energy 


-0.309 


-0.262 


E Sleep 


0.292 


-0.289 


M,L Sleep 


0.273 


-0.684 



15 

Table 3. 16 Coefficients of immediate and latent effects to symptoms (DMI) 



25 



j Symptom Factor 


Immediate 


Latent 


Anxiety 


0.243 


-0.159 


Cognitions 


-0.471 


-1.469 


Mood 


-0.351 


-0.916 


Work 


-0.752 


-0.116 


Energy 


-0.386 


-0.236 


E Sleep 


-0.334 


-0.229 


M,L Sleep 


-0.115 


-0.784 



30 
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Interaction Parameters 

In analyzing the symptom interaction coefficients (see Tables 
3.1 8 and 3.1 9), the first noticeable difference was in the patterns and 
magnitudes of the DMI interaction coefficients between the second order 
5 model and the shunting model. The second order model found stronger 
interactions for DMI treatment than the shunting model. This suggests that 
the second order system attributed simultaneous improvement to the 
interaction loops among symptoms. 

Figures 3.1 5 and 3.16}, show interactions among symptoms, together 

10 with the sequence with which the symptoms improved. In these diagrams, 
the weights associated with links between nodes represent the 
approximated total amount of change at the destination node that was 
directly preceded by change at the source node. These values were 
calculated by integrating the influence of the source value (an 

1 5 intervention effect or a factor) to the target value. 

Table 3. 18 Interaction coefficients among symptoms (CBT). See text for description. 



20 



To / From 


Anxiety 


Cognitions 


Mood 


Work 


Energy 


E Sleep 


M,L Sleep 


Anxiety 


-0.650 


-0.337 


0.315 


-0.472 


0.003 


0.255 


-0.151 


Cognitions 


0.823 


-0.535 


-1.498 


-0.900 


0.722 


0.569 


-0.693 


Mood 


0.379 


-0.535 


-0.636 


-0.451 


-0.397 


-0.135 


-0.199 


Work 


0.179 


-0.363 


-0.580 


-0.622 


0.112 


0.094 


-0.286 


Energy 


0.416 


-0.343 


0.635 


0.126 


-1.613 


-0.214 


0.101 


E Sleep 


-0.715 


0.341 


0.501 


0.063 


0.864 


-1.023 


0.480 


M,L Sleep 


-1.135 


0.318 


0.129 


-0.402 


1.821 


-0.048 


-0.010 
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Table 3. 19 Interaction coefficients among symptoms (DMI). See text for description.}} 

30 



To / From 


Anxiety 


Cognitions 


Mood 


Work 


Energy 


E Sleep 


M,L Sleep 


Anxiety 


-2.980 


-0.206 


0.931 


0.750 


0.276 


1.059 


0.336 


Cognitions 


-0.508 


-1.095 


0.742 


0.129 


-0.104 


-0.445 


-0.146 


Mood 


-1.022 


-0.048 


-0.541 


-0.373 


-0.607 


1.030 


0.649 


Work 


1.163 


0.450 


-1.358 


-1.474 


-0.327 


0.221 


1.030 


Energy 


-0.526 


-0.762 


0.929 


0.621 


-0.551 


-0.523 


-0.481 


E Sleep 


1.094 


-0.222 


-0.667 


-0.038 


0.153 


-0.746 


-0.114 


M,L Sleep 


-1.658 


-1.047 


1.520 


1.119 


-0.383 


0.161 


-0.339 
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Tables 3.18 and 3.19 show the model's coefficients for the 
interactions among the symptom factors. Each column heading identifies 
a source symptom which acts upon a target symptom (identified by the 
heading of the row). The values of these tables reflect the optimized 
5 coefficients and represent the strength of the interactions among the 
symptoms. Negative values indicate a positive source symptom acts to 
improve the target symptom (by reducing it's intensity) provided that the 
baseline of the source symptom is negative, whereas positive values of 
coefficients indicate the opposite. For example, in the case of the patients 
10 who underwent DMI treatment (Table 3.19), the results indicate that 
improvement in mood tends to move in the opposite direction from the 
work symptom factor because of the negative sign (-). Improvement in 
mood also preceded improvement in work. The strength of this 

Q interaction, represented by its coefficient, was (-1 .358). 

5 15 The vertical axis shown in Figures 3-1 5 and 

□ 3-16} correspond to the half reduction time [weeks]. 

+; Supra-threshold values (i.e. above 0.1 5) for Wy and 

sj Uj, connections among symptom factors and connecting the treatment (CBT 

W or DMI) to each of the symptom factors, are shown in the sequence 

* 20 diagrams, Figures 3-1 5 and 3-16, respectively. 

fij Cognitive Behavioral Therapy 

t= The two main symptoms that improved during recovery in response to 

4 CBT treatment were (1) depressed mood and (2) cognitions. Anxiety and 
3 25 energy were also improved by the direct effects of the intervention. 

Improved mood was followed by an improvement in work and a further 
improvement in cognitions. Improvements in sleep disturbances 
followed the improvement (reduction) in anxiety. This is shown 
graphically in Figure 3-15}, where supra-threshold Wy and 1/,-are shown. 
30 Referring now to Figure 3-1 5, a graphic representation of the 

sequence of symptom factors in recovery with Cognitive Behavioral 
Therapy treatment for the second order system. Vertical positions of the 
symptoms represent half-way-reduction time, arrows represent strong 
impacts and interactions, and corresponding numbers indicate the strength 
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of the impact or interaction. 
Desipramine 

The weight patterns captured the covariance of the symptom 
5 improvements. For example, the weight pattern of anxiety showed that 
it is affected by the mood and early sleep symptom factors. Early 
sleep in turn receives its main input from anxiety. This implies a 
circular connection, or interaction between the symptom factors. 
As shown in in Figure 3-1 6, depressed mood, work and 

1 0 interests, and energy were the first symptoms to improve after the 
latency. Improvement in mood was followed by improvements in 
cognitions, middle and late sleep, and anxiety. An analysis of the 
coefficients in the DMI recovery model revealed more x v double links" 
and recurrent connections than for CBT. When there are recurrent 

1 5 connections, as soon as one or more symptoms begin to reduce, there 
will be a large feedback causing the symptoms inside the loop to 
reduce concurrently. In the current case, anxiety and early sleep were 
doubly linked, and were also in a loop with depressed mood. 

Referring now to Figure 3-1 6, a graphic representation of the 

20 sequence of symptom factors in recovery with Desipramine treatment for 
the second order model is illustrated. Vertical positions of the 
symptoms represent half-way-reduction time, arrows represent 
strong impacts and interactions, and corresponding numbers indicate the 
strength of the impact or interaction. Dotted arrows show the 

25 interactions that operate in loops. 

Additional Treatment Effects Due to Model Parameters 

The damping factor (parameter A / in equation 3.6 
reflects the model's tendency to slow down the speed of change of the 

30 symptom factor value. Optimized values for cognitive behavioral 
therapy and desipramine treatment are shown in Table 3.20. A clear 
finding in the baseline and the decay rate and latency parameters of the 
model was that the symptom factor "work" improves strongly in 
response to CBT treatment. This improvement was ascribed to a large 

35 immediate effect at the onset of the treatment (large negative value 
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(-0.752) in Table 3.17. There was also a large negative 
self-interaction value which tends to drive the symptom to improve. 
The baseline (parameters y in equation 3.6) reflects pre-treatment 
symptom factor values. Optimized values for cognitive behavioral 
5 therapy and desipramine treatment are shown in Table 3.21. 

Table 3.20 Damping Factors (units of week -1) 



Symptom Factor 


CBT 


DMI 


Anxiety 


1.33 


2.89 


Cognitions 


1.91 


2.00 


Mood 


1.21 


1.70 


Work 


1.28 


2.47 


Energy 


1.96 


1.66 


E Sleep 


1.81 


2.09 


M,L Sleep 


1.30 


1.94 



Table 3.21 Baseline (pre-treatment factor values) 

20 



Symptom Factor 


CBT 


DMI j 


Anxiety 


0.239 


-0.324 


Cognitions 


0.231 


-0.668 


Mood 


0.380 


-0.397 


Work 


-0.730 


-0.062 


Energy 


0.062 


0.016 


E Sleep 


-0.242 


-0.542 


M,L Sleep 


-0.149 


0.807 
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Limitations of Half Reduction Time Measure 

The mean half reduction times of the raw data were correlated with 
the mean half reduction times predicted by the model. The results of 
the correlation, presented in Table 3.2 show that the correlation between 
5 the raw data values and the model predicted values were significant 
for the combined CBT and DMI treatment groups (p<0.01), were highly 
significant for the CBT treatment group (p<0.001), and were not 
significant for the DMI treatment group. While these results pose 
problems in interpreting the model's predictions, there is sufficient 

10 justification for believing that the half reduction times predicted by 
the model reflect the actual patient data. For example, the goodness 
of fit of the models to the data overall, are highly significant, 
showing that the models have predictive power for both CBT and DMI 
treatments. In addition, the lack of a significant correlation for 

1 5 DMI may be a the result of deficiencies with the half reduction time 
measure on this data set. The half reduction time is only defined 
when a symptom improves and is present. Any correlation between lack 
of improvement in predicted and actual response for example, would 
result in no defined half reduction times and thus would be excluded 

20 from the computed correlation coefficient. Other measures not 
restricted to time recovery would not suffer from this lack of 
robustness when recovery is not present. 

Other Limitations 

25 The current pilot study has many technical limitations. First, this 

model does not distinguish transient from permanent effects of 
treatment. Data subsequent to the termination of treatment were not 
available for either study (CBT or DMI). Second, the current method 
only partly distinguishes the {\em order} of the recovery from {\em 

30 causal sequence} of the recovery. They are distinguished in the cases 
where a correlation method can distinguish them. For example, assume 
factors A, B, and C in Figure \ref{Fig:causalEffect} improved in this 
order. We cannot tell by just looking at the sequence whether A or B 
independently or jointly caused the C to improve. In an attempt to 

35 differentiate sequence and causality we examined our correlation 
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coefficients using the following logic. 

If the correlation coefficient reflecting the rate of change in the 
improvement rate of C indicated high correlation with a low value of A 
(as indicated by thick arrows) but not B, it suggests that only the 
5 improvement in A caused the improvement in C. In the second order 
system, this can be evaluated by looking at interaction coefficients 
Wjj as follows: Consider a negative interaction coefficient with 
a large absolute value. During the time when the value of the source 
is lower than the mean, the second derivative of the target tends to 
10 be negative. That reduces the first derivative of the target factor, 
and eventually the target factor decreases. However, the causality 
and correlation cannot be distinguished when the patterns of recovery 
of A and B are nearly identical, or when the interactions do not 
m manifest themselves in a second differential form. A more fundamental 
J 1 5 problem exists when there is an unmeasured factor D affecting A and 
O after some time affecting C, thus creating a false correlation from A 
rfj to C. This can be teased apart only by showing that fluctuation added 
vj at A affects B but fluctuation added at B does not affect A. This type 
y of analysis is not incorporated in the current research. Third, the 
=P 20 current method does not incorporate stochastic analyses, which are 
jL commonly done in standard time series analysis. Incorporation of such 
5 more powerful methods requires a larger number of data than were 
II available for the current research, and could be undertaken in future 
£3 research. 

S 25 Referring now to Figure 3-1 7, an illustration of sequence and causal 

relationships among patterns of recovery is shown. Three curves (A, B, and 
C) in the graph show examples of hypothetical recovery patterns. Thick 
arrows show sequential relationships that can be captured by the 
current method. 

30 

There was no difference seen in overall (severity) response times. 
In both groups mood was the first symptom to improve and 
middle/late sleep was the last. 

Symptom improvement sequence clustered differently in the two 
35 treatments. The cognitive and mood symptoms (sad mood, thoughts of 
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guilt or suicide, and anxious mood) improved significantly earlier 
(p<0.05, two-tailed) in CBT than DM I. 

The recovery pattern for cognitive behavioral therapy tends 
to group into two phases with a trend of a third phase, whereas for 
5 desipramine, the recovery pattern does not group into phases. The 
desipramine response also shows a significant delayed effect, not 
found in the cognitive behavioral therapy response. 

The results presented demonstrate that models of 
$ clinical recovery derived from data on different treatments predict 
10 different recovery patterns. Patterns predicted from baseline values 
of patients treated with cognitive behavioral therapy showed early 
improvement in sad mood, thoughts of guilt or suicide, and anxious 
mood when compared to the recovery patterns predicted from the initial 
data of patients treated with DMI. Given that the overall severity 
j 1 5 improved at the same rate in the two groups, but the cognitive factors 
□ did not, it may be beneficial to consider a combined treatment for 
J patients who are at a high risk for suicide, as described below. 

y The analyses identified which symptoms are affected and 

=P 20 when they are affected in response to two different treatment 
JL interventions. This information could be utilized during treatment to 
J?! monitor deviations from the standard time course. In the case of CBT, 
ry it may help to determine whether it is necessary for a specific 
O symptom factor to improve before another, to identify the various 
% 25 stages of the recovery process in CBT. For example, to recover in 
w work and activities, the patient may first need to show improvement in 
mood and depressogenic cognitions. 
Implications for the Treatment of Suicidal Patients 

After the onset of treatment, the duration of time required to capture 
30 change in ail of the symptom factors is shorter for DMI (3.9 weeks in 
half reduction time) than that of CBT (5.0 weeks). However, crucial 
factors for suicidal patients are the cognitions (guilt and suicidal 
thoughts) and mood (sadness), and these factors are improved by CBT 
earlier (1.5 and 1.4 weeks respectively in half reduction time) in the 
35 course of treatment than they are by DMI (3.5 and 2.1 weeks). Note 
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that the cognitions factor responded much earlier in the sequence when 
treated with CBT than DMI, and the cognitive symptoms (anxiety, 
guilt/suicide, and mood) all responded more quickly to CBT treatment 
(p<0.05). This suggests that patients who report hopelessness and 
5 suicidal thoughts may benefit from either CBT alone or a combined 
treatment of CBT and DMI. However, this interpretation is only made 
with respect to moderately depressed patients in a typical out-patient 
sample, and is not known for severe patients or hospitalized patients. 
No severely suicidal patients were included in the sample and those 
1 0 that were had suicidal symptoms assured that they would not act on 
their thoughts during the study. Thus, this suggestion is speculative 
and awaits confirmation by further study. 

Prediction of Outcome From Baseline 

15 

Two nonlinear methods are shown to both perform significantly 
better than multiple linear regression. Multiple linear regression is shown 
to perform at chance levels, while both a nonlinear neural network 
model and a nonlinear quadratic regression model perform at significantly 

20 above chance levels. This suggests that (1) important non-linear 
relationships are present in the data, and (2) the particular nonlinear 
method employed is not as important as its ability to model complex 
relationships in the data. Since quadratic regression performs about as 
well as backpropagation, it appears to be the interaction among variables, 

25 i.e. The nonlinearities, that are responsible for the increase in predictive 
performance. Consequentially, clinical researchers can use their 
current regression methods to reanalyze their existing data exploiting 
this new knowledge. 

A predictive relationship (mapping) between 

30 pre-treatment symptoms, either individually or collectively, and 

treatment outcome was investigated. One clinical data set was utilized 
under each of multiple linear regression, neural network modeling, and 
quadratic regression to determine the predictive value of each of the 
aforementioned methods. 

35 Three subproblems arose. First, the methods use different numbers of 
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parameters and thus there was an inequity in the comparison. 
Second, the nonlinear methods required more parameters than the linear 
methods. This created problems of over-fitting in cases of sparse 
data. Finally, the data contained irregularities resulting from 
5 limitations of the instrument from which they were obtained. 

RELATED RESEARCH 

Problems with previous methods of analysis included: (1) the findings 
10 on outcome prediction from baseline clinical symptoms are inconclusive 
and sometimes inconsistent among different researchers; (2) the majority 
of findings resultedfrom analyses using only linear methods; and (3) 
evidence exists for nonlinear relationships between clinical variables and 
outcome. 

1 5 The studies used to arrive at the above problems had to (1 ) 

use HDRS symptoms or severity as a potential predictor and one of the 
following outcome measures to have comparable dependent variables: (a) 
final HDRS, (b) improvement in HDRS score (c) improvement ratio in HDRS 
or (d) a categorical measure based on these continuous measures; and (4) 

20 use one of the following treatments to have comparable independent 
variables: (a) cognitive behavioral therapy; (b) desipramine; or (c) 
fluoxetine; (2) use short term placebo controls to show clear effects; 
and (3) be evenly distributed demographically (age, sex, etc.) to 
reduce bias in the comparison sample. 

25 

Summary of Findings 

Table 4.1 summarizes reports (1986-1994) of attempts to 
predict outcome from baseline clinical variables. The clinical 
variables considered here as potential predictors of outcome were 

30 either (a) one or more of the 21 baseline HDRS individual item 

severity scores or (b) the baseline HDRS total severity score (overall 
depression severity). These clinical variables are listed in the first 
column of Table 4.1 under the heading Symptoms. The 
remainder of the columns identify the treatment administered as part 

35 of the various research studies. Each entry in Table 4.1 
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is an index into Table 4.2, which gives the reference. When 
a clinical variable was reported to be predictive of outcome 
(p<0.05), the number identifying the study is underline. When the 
clinical variable was found to be not-significant in predicting outcome, it 
5 is not underlined. A blank entry indicates that the predictive power of the 
clinical symptom was not reported. 

Of 1 9 accounts in which the predictive value of severity was 
evaluated, 1 1 found it to be predictive with statistical significance. 
(For the purpose of maintaining readability, citations are not 

f lO included in this subsection. To find references, consult Table 

4.1 and Tableb 4.2. As for individual symptoms, 3 of 13 findings reported 
found depressed mood to be a predictor, 2 of 2 for late insomnia, 2 of 8 for 
somatic—gastrointestinal, 1 of 11 for work and interests, 2 of 1 4 for 
retardation, 2 of 1 2 for middle insomnia, 1 of 1 3 for weight change, 1 of 7 

2 1 5 for insight, 1 of 1 0 for hypochondriasis, and 1 of 14 for agitation. No 

O other independent symptoms were found to be significant predictors in 

± the literature considered here. 

tl Focusing on each treatment, it can be seen that: amitriptyiine (Ami) 

y increased overall severity (1 of 1), depressed mood (1 of 2), middle 
£20 insomnia (1 of 1), somatic—gastrointestinal (1 of!) and 
%. hypochondriasis (1 of 1), predicted poorer response whereas increased 
2 severity in insight (1 of 1) predicted better response; for imipramine 
flj (IMI) greater overall severity (3 of 4) predicted both better response 
0 (2 of 3) and poorer response (1 of 3), greater depressed mood (1 of 2) 
J? 25 predicted poorer response, greater late insomnia (1 of 2), and greater 
retardation (1 of 3) predicted better response; for tranylcypromine 
(Tran) greater depressed mood, greater retardation, and greater weight 
change predicted better response, while greater middle insomnia and 
greater late insomnia predicted poorer response, (1 of 1 each, from 
30 the same paper); for electroconvulsive therapy (ECT) greater overall 
severity (1 of 1), greater depressed mood (1 of 1), greater work and 
interests (1 of 2), greater agitation (1 of 2), and greater 
somatic-gastrointestinal (1 ofl) predicted poorer response; for 
interpersonal therapy (IPT), greater overall severity predicted poorer 
35 response (1 of 1 ) — individual symptoms were not reported; and for 
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maprotiline (Map) greater overall severity predicted both better and 
poor response (1 of 2 each), individual symptoms not reported; and for 
levoprotiline (Lev) greater overall severity predicted poorer 
response; individual symptoms were not reported (1 of 1). 
5 Overall severity at baseline was found to be predictive throughout 

many treatment studies. At least one study for each treatment found 
overall severity at baseline to be significant except desipramine (0 of 2), 
clomipramine (0 of 2), fluoxetine (0 of 1), and cognitive behavioral 
therapy (0 of 1 ). However, baseline severity was not a consistent 
1 0 predictor of outcome, being confirmed only by 1 1 of 1 9 accounts (from 
thirteen studies). Much less predictive reliability at baseline was found 
in individual symptoms. 

„ Severity as a Predictor 

J 1 5 Baseline HDRS severity alone was found to be inconclusive as a 

O predictor of general response to treatment because it was found both 
%_ to be a significant predictor of response and also to not be a 
Vi significant predictor of response. Examples from the literature 
y follow. 

20 Of thirteen studies, nineteen accounts of tests for baseline HDRS 

L severity as a predictor of outcome were reported, eleven accounts (in 
-Jj seven of the studies) found baseline severity to be statistically 
ry significant 

O {Katz:87,Pande:88,Sotsky:91 ,Vallejo:91 ,Filip:93,Hoencamp:94,Katon:94} 
Jj 25 and eight accounts (in six studies) 

w {Kocsis:89,Nagayama:91,Bowden:93,Hinrichsen:93,Johnson:94,Joyce:94} 
did not. 

Of the eleven accounts that found overall severity to predict 
response, five found greater severity to predict better response and 

30 six found greater severity to predict poor response to treatment. 

Vallejo et al. {Vallejo:91 } found the more severe the depression (baseline 
HDRS total), the better the outcome (percent reduction in HDRS) in a 
study of 11 6 out-patients treated with imipramine (N=89) or phenelzine 
(N=27), evaluated at outcome, 6 weeks ($r=0.22, p=0.01 5$) and also at 

35 a 6 month follow-up ($r=0.20, p=0.029$). Higher baseline HDRS 
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severity was also found to indicate increased chance of recovery by 
Hoencamp et al. {Hoencamp:94}, in a three-phase sequential medication 
study (maprotiline ($N=1 19$), lithium augmentation/brofaromine($N=51$), 
maprotiline and lithium ($N=22$)), ($B=0.31$, $p<0.001$). 
5 In contrast, severity was not found to be significant when the 

clinical efficacy of fluoxetine and desipramine was compared in a 
double blind parallel group study of major depressive disorder 
(including both in-patients and out-patients) \cite{Bowden:93}. The 
clinical responses of severely ill patients (those with baseline HDRS 
1 0 scores of 24 or greater) were compared to moderately depressed 
patients (those with baseline HDRS scores less than 24). No 
significant differences were found between the drugs when compared 
across severity categories and no significant differences between the 

n two drugs were found when compared within severity 

3 15 categories. 

O Baseline severity did not significantly correlate with percent 

J improvement or final severity score in 1 04 patients who participated 
in a study designed to examine predictors of short-term response to 
U desipramine and clomipramine \cite{Joyce:94}, and baseline severity 
* 20 was not found to be a significant predictor of outcome at the 4-month 
P follow-up in patients with major depression; antidepressant treatment 
y was not specified {Katon:94}. 

fll This lack of clear predictive results for severity is not surprising 

£J because severity is nonspecific with respect to symptoms. Different 
j| 25 syndromes of equal overall severity may respond to different 
treatments. For example, Elkin et al.{Elkin:89} was only 
able to find significant differential treatment response to cognitive 
behavioral therapy, interpersonal therapy (IPT), imipramine with 
clinical management and placebo with clinical management in a 
30 secondary analysis. When the population was analyzed based on 
baseline severity, those patients who were less severely depressed 
(HDRS score totals less than 20) showed no significant difference in 
their response to treatment. The more severely depressed (HDRS score 
totals greater than or equal to 20) responded best to imipramine with 
35 clinical management and worst to placebo with clinical management. 
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Response to CBT and IPT were in between, but closer to imipramine with 
clinical management, with the response to IPT better than the response 
to CBT. 

In addition, the lack of reliability in baseline severity as a 
5 predictor of outcome could also be due to different outcome measures 
(see below), differences that result from treatment-specific 
responses, population differences, such as demographics, as well as 
the independent variables chosen to be tested as predictors. 

10 Individual Symptoms as Predictors of Outcome 

None of the studies that met the criteria for comparison found 
individual symptoms to be predictive of outcome. On the other hand, 
four related studies found seven symptoms to be predictive 
|White:86,Katz:87,Pande:88,MeGrath:92}. Of these, depressed mood 

1 5 was the most frequent and occurred in four of the findings; middle and 
late insomnia occurred each occurred twice; and 
gastrointestinal—somatic, work and interests, retardation, agitation, 
hypochondriasis, weight loss, and insight each occurred once (see 
Table 4.1). In addition, individual symptoms were, however, predictive of 

20 outcome in an amitriptyline study of depression Sauer et al. {Sauer:86} 

found moderate late insomnia (p=.035) and poor insight (p=0.025) predicted 
a better response whereas severe middle insomnia (p=0.031), 
gastrointestinal symptoms (p=0.046) and hypochondriasis (p=0.01 7) 
predict poorer response (N=50). 

25 For example, prediction of outcome from symptoms has demonstrated 

in atypical depression. Atypical depression is characterized by 
depressions where a group of symptoms (behaviors) are the opposite of 
what is commonly observed in typical depressions. Features of 
atypical depression are oversleeping, overeating, severe lack of 

30 energy, and pathologic rejection sensitivity. McGrath et al. (1992) 
{McGrath:92} showed that atypical depression patients showed a 
clear and consistent pattern of poorer response to imipramine. 

In addition, when over-sleeping and leaden paralysis were both 
present, the these two symptoms (in addition to the atypical symptoms) 

35 significantly predicted less improvement with imipramine. In this 
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example, the symptoms of atypical depression predict poor response to 
imipramine. The dynamics of atypical depression seem to indicate 
nonlinearity. No one symptom accounted the poorer response to 
imipramine, severity in any one of the four atypical symptoms 
5 (oversleeping, overeating, severe anergy, and pathologic rejection 
sensitivity) produces the effect {McGrath:92}. Furthermore, the 
presence of more than one symptom does not increase the differential 
effect. 

1 0 Outcome Measure 

Apparent inconsistencies may be due to the outcome measure used 
{Filip:93}. The findings of Filip et al. f {Filip:93} and Popescu et al. 
{Popescu:93} provide an example of (a) a case, within one study, where 
results are significant using one outcome measure and not significant 

1 5 using a different outcome measure and (b) a case where using one 

outcome measure the results of two studies are consistent, but using a 
different outcome measure their results are inconsistent. They report 
that baseline HDRS is predictive of outcome, i.e., when outcome is 
defined as final HDRS score (either levoprotiline or maprotiline, 

20 N=55, r=0.51,p<0.0002; N=108, F=5.66, p<0.01), respectively. 
However, when outcome is defined as percent change in 
HDRS, their findings are inconsistent. Filip et al. found that 
baseline HDRS was not a significant predictor of outcome, Popescu et 
al., found that the less severe patients (those with lower baseline 

25 HDRS scores) were more likely to respond to [an unspecified] tricyclic 
antidepressant treatment] (N=108, F=20.12, p<0.01). Although 
Filip et al. argue that the final score is most consistent with the 
physicians judgment, in an attempt to prevent this potential 
inconsistency, the results were compared to only to those results that were 

30 obtained using the same outcome measure used in the data, i.e. 
percent change in HDRS for the purposes of review of significant 
findings reported in Table 4.1. 

Evidence of Nonlinearities 
35 The specific findings are reviewed that ledto the belief 
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nonlinear relationships may exist between clinical symptoms, 
treatment, and outcome and therefore should be explored in the attempt 
to predict outcome from baseline clinical symptoms. In particular, 
the studies reviewed in this section suggest the presence of two 
5 different types of nonlinear relationships which may help to explain 
the inconclusive and sometimes inconsistent results found in the 
literature. The first type of nonlinear relationship would be 
differences observed across different treatments, indicating 
treatment-specific responses for subsets of symptoms. The second 

10 type of nonlinear relationship would be nonlinear relationships 
observed within a given treatment. Evidence for both these types of 
nonlinearities exist in the data. If the relationships were linear, 
either within or across these treatment groups, a separate linear 
model would be needed for each one. Using a nonlinear model, it may 

15 be possible to capture relationships in a single model given some 
overlap of effects. Also, a nonlinear model would be able to capture 
curvilinear relationships between symptoms and outcome for a given 
treatment. 

20 Nonlinearities Across Treatments 

When one looks at symptoms (across a row) of Table 4.1, it appears 
that for any given symptom, symptoms in general, i.e., symptoms 
independent of the treatment administered, were not found to be 
significant predictors of outcome. In contrast, when one looks within 

25 a given treatment (down a column), it appears that treatments may have 
specific symptom profiles (combinations of symptoms) that when taken 
together are significant predictors of outcome for that treatment. 

In a treatment-specific response relationship, the treatment acts as a 
switch, selecting for a set of symptoms which may be different from 

30 those symptoms that another treatment might select. In a response 
within a given treatment, the response may indicate effective ranges 
of symptom severity for which the treatment is effective. 

For example, looking across symptoms, we see inconsistent findings 
for many of the symptoms. Increased severity of depressed mood, depending 

35 on treatment, was found to positively predict outcome for 
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tranylcypromine, to negatively predict outcome for amitriptyline, 
imipramine, and electroconvulsive therapy, and to not be predictive of 
outcome for S-adenosyl methionine, imipramine, desipramine, 
clomipramine, fluoxetine, and cognitive behavioral therapy. Increased 
5 severity of middle insomnia predicted poor response for amitriptyline 
and tranylcypromine, but not predictive of response for any other 
treatment reported. Increased late insomnia predicted favorable 
response for imipramine, poor response for tranylcypromine, and did 
not predict response for all other treatments reported. Greater 
1 0 severity in the work and interests item predicted poor response for 
ECT only; increased severity of retardation predicted favorable 
response for both imipramine and tranylcypromine; increased severity 
in the somatic-gastrointestinal symptom predicted poor response for 
n only amitriptyline and ECT; increased hypochondriasis predicted poor 
J 1 5 response for tranylcypromine, and lack of insight (increased severity 
□ of insight symptom) predicted positive response for amitriptyline 
% only. In most of the findings reported, symptoms were not predictive 
Sj of outcome. When symptoms were reported to predict outcome, most were 
W not consistent across treatments in that the same symptom predicted 
=F 20 opposite effects. 

L, Reports of attempts to predict outcome from baseline 

jjj HDRS symptoms were focused on. Therefore, other reports showing 
ni consistent results using other instruments which were excluded from 
0 review for reasons of comparability. Thus, the entries in the 
S 25 table under each treatment may not be representative of the entire 
literature and further treatment-specific consistent patterns might be 
apparent with a broader survey. 

In Table 4.1 1 , it will be shown that the interaction effects of 
severity and thoughts of guilt and suicide (Cog. Severity), severity and 
30 anxiety (Anxiety.Severity), and severity and early sleep disturbance 
(ESIeep.Severity) seem to be highly significant for the prediction of 
outcome to a heterogeneous sample of patients treated with desipramine, 
fluoxetine, or cognitive behavioral therapy. Furthermore, nonlinear 
interaction effects yield the most significant results for these data. 
35 In addition, backpropagation with treatment included in the input variables 
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gives the most highly significant result across these data. The treatment 
type may select for overlapping syndromes responsive to a particular 
drug or psychotherapy. The interaction terms suggest different 
syndromes such as learned helplessness or anxious depression. 
5 Crossing the nonspecific independent variable severity with a specific 
symptom factor may help to identify these syndromes. We did not have 
the data available to validate the results, but the reports reviewed 
in this chapter indicate that different symptoms may predict outcome 
to different treatments. For example, the combination of severity, 

10 late insomnia and retardation may predict response to imipramine; the 
combination of depressed mood, middle and late insomnia and change in 
weight may predict response to tranylcypromine; and the combination of 
severity depressed mood, work and interests, and 
somatic—gastrointestinal may predict response to electroconvulsive 

1 5 therapy. 

Nonlinearities Within Treatment Response 

Nonlinearities which are induced by U-shaped relationships between 
symptoms and treatment response are considered herein. 

20 Joyce et al. {Joyce: 89 } reported that those with an 

intermediate level of severity respond best to treatment with 
tricyclic antidepressants. Thus, those with either very mild 
depressions or very severe depressions do not respond well, suggesting 
a nonlinear relationship within the tricyclic antidepressant drug 

25 family. 

Furthermore, endogenous depressions have been reported to respond 
better to tricyclic antidepressants than nonendogenous depressions 
Joyce:89,Paykel:72,Raskin:76}. There are conflicting findings 
{Joyce:89,Simpson:76} which could be explained by curvilinear 

30 relationships between endogenous symptoms and amitriptyline response 
{Joyce:89,Aboul-Saleh:83}. 

Neurotransmitter metabolite data from blood, urine, or plasma was 
not included in this study. However, Samson et al. {Samson:94} found both 
high and low urinary 3-methoxy-4-hydroxyphenylglycoi (MHPG) levels to be 

35 characteristic of late insomnia, and postulated that this may indicate a 
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nonlinear relationship between symptoms of depression and underlying 
biochemical abnormalities. 

The aforementioned studies suggest that the statistical significance 
5 of severity is inconsistent through the literature reviewed, however, it 
appears that one or more of the following reasons may be contributing 
factors to this inconsistency: (1) statistical effects of different 
populations; (2) different outcome measures; (3) comparison across 
treatment groups which might be selective for subpopulations with 

10 different symptom profiles but the same overall baseline HDRS severity 
score; (4) curvilinear relationships between independent and dependent 
variables. Thus, inconsistencies in the predictive value of severity appear to 
be largely due to differences between studies. In addition, the data 
summarized above suggests a consistent response to different drugs. A 

1 5 broader review would be necessary to substantiate these results. 

METHODS 

There are three categories of methods presented in this section. 

20 

The procedure used in the comparison of linear and nonlinear methods 
was as follows: First independent (input) was selected and 
dependent (output) variables. These were the same seven symptom 
factors and severity that were used in discussed above.r:pattern}. 

25 There were two reasons for this choice: (a) to 

maintain consistency with Study 1 above, which would facilitate 
integration of these results; and (b) the data available were too few for 
each of the HDRS items to be allocated separate independent variables 
without over-fitting the data. Next the best population 

30 distribution to assume was selected. The backpropagation algorithm and 
multiple linear regression was applied to the original data and to data that 
were rescaled based on normal, exponential, and gamma 
distributions. Finally, seven data sets were created, three from the 
individual treatment groups (CBT, DMI, and FLU), and four 

35 combinations: drug only (DMI and FLU) and all treatments (CBT, DMI, 
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and FLU), both with and without an independent variable to indicate 
treatment. Also described below are the methods used to address the 
three subproblems mentioned above, i.e., different numbers of parameters 
in the models, dependent on sample size, and irregularities in the data. 

5 

The Models 

Three mathematical models: neural network; multiple linear 
regression; and quadratic regression were investigated. 

10 

Backpropagation 

To evaluate the ability of a nonlinear neural network method to 
predict response to treatment from a set of symptoms and treatment, a 
network algorith called backpropagation was chosen 

15 {BryHo:69,Werbos:74,Rummelhart:86}. The backpropagation 

algorithm is based on gradient descent, which changes the weights of 
the network to learn a mapping between input and output vectors. A 
backpropagation network was chosen for the following reasons: it is a 
widely used and accepted neural network architecture; the software is 

20 readily available from multiple sources; and it is simple to use and 
relatively easy to interpret. Standard and accepted 
techniques were utilized in order to make the analyses easily reproducible 
by others. 

A three layer backpropagation network model with two hidden units 
25 was used in this study. The input layer had one of four configurations 
(i.e. number of input nodes) dependent the data set (discussed in 
Section sample size}. For all data sets without inclusion of the treatment 
as one of the inputs, the number of input nodes were eight. These were for 
the seven symptom factors and the severity of symptoms. When treatment 
30 information was included, each treatment was allocated an individual input 
node which would be set to either zero or one, for patient received the 
treatment or patient did not receive the treatment, respectively. No 
patient received more than one treatment in any of the three studies. 
Thus for the data set that combined the two drug studies, the number of 
35 input nodes were ten. The seven symptom factors, the severity, and the two 
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additional nodes allocated to flag the treatment the patient received. The 
study that combined all three treatment groups had eleven input nodes. 
The output layer had one node, representing the response of the patient. 
The transformation function at the output layer was chosen to be 
5 linear, as that gave the best results. In a few instances, where 
noted, a logistic function was used on output. The logistic output 
function, being the exception, is noted when presented, and therefore, 
unless specified, the linear function can be assumed. The input and 
output representations are described in Section: Data Representation. 
10 The threshold at the output node was set to 0.5. Activation above 

threshold was interpreted as predicting a responder, and activation 
below as predicting a non-responder. The prediction was then compared 
with the calculated category from the data to determine whether the 
0 network's prediction was correct. 

y| 1 5 Referring now to Figure 4-1 nonlinear mapping of backpropagation, 

Q each hidden node finds a direction in the input space (illustrated by an 
j arrow perpendicular to a small square piece) to which the output is 
s! sensitive to. The output of each hidden node goes through a nonlinear 
y output function before being weighted and summed at the output node. 
*F 20 In the nonlinear backpropagation neural network model, the 

JL backpropagation algorithm was expected to find any subset of inputs 
y that were predictive of the outcome and modify its connection weights 
nj in order to map their values to the predicted outcome, even when the 
O relationship between them is nonlinear. In a backpropagation network, 
J{ 25 this is made possible in the following manner. Hidden nodes in a 

backpropagation network find important subspaces which are determined 
by input weight patterns (see Figure~\ref{Fig:BPmap}). Output values 
of hidden nodes are transformed by a nonlinear function, and the 
degree of nonlinearity depends on the magnitude of the input weights 
30 and the size of the bias input to each hidden node. These inputs are 

weighted and summed at the output node, where another nonlinear output 
function is applied. 

Regression coefficients were calculated by a standard procedure: LU 
decomposition with Gaussian elimination using partial pivots. 
35 Backpropagation Training Procedure 
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Training a backpropagation network model involves two steps. The 
first adjusts model parameters which determine the behavior of the 
training algorithm. The second specifies the criteria for termination 
of training. Based on preliminary tests, the following parameter 
5 settings were chosen for all trials: the learning rates of the 

weight modification rules were set to 0.01 (i.e., for both input to 
hidden and hidden to output); the momentum, which determines the 
effect of the previous weight change on the current weight change, was 
set to 0.9; the squashing function at the output node was set to 
1 0 be linear; the temperature for the squashing function was set to 
1 ; and training was terminated after 1 0,000 epochs. See 
{Hertz:91} for definitions of these terms within the 
backpropagation framework. 

3 1 5 Linear and Quadratic Regression 

O The linear regression and quadratic regression analyses were carried 

%_ out using the S-Plus statistical package (Statistical Sciences, 1 993). 
g The quadratic regression methods used the same regression algorithm, 
y however, a backwards stepwise procedure, also part of the S-Plus 
4= 20 package was used to adjust the number of parameters in the model. 
U Quadratic regression included a new set of independent variables. The 
2 additional variables represented two-way interactions between 
f!j symptoms. Then the backward stepwise regression was used to 
B select the best model. The backwards stepwise regression procedure 
% 25 starts with the model that includes all variables (parameters) for 
" each symptom and all two-way interactions. Then it systematically 
removes parameters that have the smallest affect on the performance of 
the model. This was repeated, in our case, until the model size was 
equal to the size of the comparison model (see below). In doing this, 
30 the linear model became nonlinear (quadratic), but the method (regression) 
remained unchanged. 
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Compensation for Different Numbers of Parameters 

Different models have different numbers of parameters. This makes 
the comparison biased in favor of the model with more parameters; the 
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mode! with more parameters will predict more of the variance in the 
data. To achieve equality across the different models tested we used 
three approaches. One approach constructed a measure of the 
proportion of variance explained by the model (r 2), proportion of variance, 
5 which was used to estimate the performance expected by chance. The 
second approach used the chi square and F statistics, goodness of fit 
statistic. These methods, explicitly and implicitly, take into account 
the number of free parameters in the models. The third approach used 
backward stepwise quadratic regression to systematically limit the 
10 number of predictive variables and thus ensure that both models had the 
same number of parameters for the comparison. When we compared the 
results to multiple linear regression, we chose a model of size 1 1 , when 
compared to backpropagation, a model of size 21 was chosen. This provided 
an unbiased way to account for differences in performance. 

S 15 

□ Compensation for Sample Size 

J Another subproblem was that nonlinear methods require more data 

Zl because typically they have more parameters to estimate the same 
W predictive performance and power. More parameters mean more degrees 
£ 20 of freedom, which means more data is required to compensate for 
L over-fitting. A combination of two approaches was used. One approach 
y combined the data from three treatment studies, cognitive behavioral 
rJ therapy (CBT), desipramine (DMI), and fluoxetine (FLU). This produced a 
5 larger data set, which typically increases the power of the model to 
1 25 predict outcome. The drawback of this approach is that the data are no 

longer homogeneous by treatment, which can obscure the results. The other 
approach treated each study separately. This yields more reliable results, 
but the smaller data sets decrease the predictive power of the model. For 
completeness, seven data sets of independent variables were created. 
30 Five of these consisted of treatment groups or combinations: One for 
each of the different treatments (CBT, DMI, FLU), one for a drug only 
(DMI+FLU), one for all treatments (CBT+DMI+FLU). Two additional groups 
were created by adding a dummy variable (TxFlag) that indicated which 
treatment the patient received: drug with treatment flag 
35 (DMI+FLU+TxFlag) and all treatments with treatment flag 
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(CBT+DMI+FLU+TxFlag). 

Compensation for Irregularities in Data 

Two different prediction algorithms, multiple regression and 
5 backpropagation were applied to each of the four sets of untransformed 
and transformed data on the combined data with treatment flags. This 
preliminary analysis indicated the exponential transformation yielded 
the best results for these data. Consequently, comparison of all 
three methods (multiple regression (MR), backpropagation (BP) and 
10 quadratic regression (QR) was completed using the exponentially 
transformed data. Table 4.3 shows the models and transformations. 

Table 4.3 Three population distribution assumptions were analyzed. For each of these four 
data sets (one untransformed, three transformed), 
1 5 multivariate regression (MR) and backpropagation (BP) models were 
applied. The transformation that resulted in the best performance was 
chosen for subsequent analyses. 



Method 


Transformation 


MR 


Raw 


Norm 


Exp 
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BP 


Raw- 
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Data Representation 

This section describes the input and output data representation of the 

25 independent and dependent variables used in this study. The input 
data were seven symptom factors: Mood, Cognitions, 
Early Sleep Disturbance, Middle and Late Sleep Disturbance, Work and 
Interests, Energy and Retardation, and Anxiety. In addition, there 
was a variable for Severity, and in some instances, additional 

30 variables indicating the treatment received. In the case of the 

quadratic regression, input variables included some subset of those 
already discussed in addition to single variables representing the 
interaction of two symptoms. 

In addition to the encoding of the data and any other transforms, such 

35 as the exponential transformation discussed in the previous section, 
z -score transformations were applied to both independent and 
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dependent variables, as the last step of preprocessing. 

The same symptom factors were utilized for two 
reasons. First, maintaining consistency with will 
facilitate integration of these findings later. Second, although 
5 ideally all 21 HDRS items and severity would have been analyzed, 
enough data ws not available to prevent over-fitting. 

Input Representation 

Table 4.4 identifies the input data (independent variables). 
1 0 It consisted of: (a) seven symptom factors derived from the twenty-one 
Hamilton item scores measured prior to treatment; (b) the total for the 
pre-treatment Hamilton scores and (c) the treatment the patient 
received (desipramine, fluoxetine, or cognitive behavioral therapy). 

1 5 Table 4.4 Independent Variables - Inputs to the models. The symptom factors and Hamilton 
Total are the raw (untransformed) values as represented on the HDRS scale. The values for the 
treatments represent binary flags indicating the treatment the patient received. Only one of the 
treatment flags can have the value of 1 for any given patient. 



20 



Input Description 


Raw Scale Value 


Desipramine Treatment 


0, 1 


Cognitive Behavioral Therapy Treatment 


0, 1 


Fluoxetine Treatment 


0,1 


Symptom Factors [1..7] 


0, 1, 2, 3, 4 


Beginning Hamilton total 


[0..65] 



Output Representation 

The target output data to be predicted (the dependent variable) was 
the change in the severity of the symptoms after treatment. We chose 
30 the raw percent improvement (outcome) as the output since this measure 
is commonly reported. The computation for the outcome measure 
(percent change is in HDRS total) is given by 
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%AHDRS = HDRSb ^7 HDRS} ^ * ™ >D. 

where A HDRS is the response to treatment in terms of 
5 percent change, HDRSbaseline is the baseline (pre-treatment) HDRS 
total score, and $HDRS fj na i is the ending (post-treatment) HDRS 
total score. 

For DMI, the HDRS fj na i value is week 6, for CBT it is week 1 6. 

10 Selection of Population Distribution Function 

Irregularities in the data arise from limitations of instruments of 
this type to account for underlying probability distribution 
information. The best of three normalization functions that were applied 
to the data were selected. 

1 5 The Hamilton Depression Rating Scale, as other psychiatric scales of 

depression, is an ordinal scale. It consists of 21 different and 
independent ratings that are arbitrarily assigned a fixed numerical 
value (see Equation 4.1). The higher numbers on these scales represent 
more of a quantity: e.g., helplessness, energy, suicidal thoughts, etc. 

20 However, the numeric quantity to assign these scale values is not well 
defined. Typically, these numerical values are used in quantitative 
analysis of psychiatric data {Hamilton:60,Hamilton:67,Filip:93}. Only these 
values could have been used, however, a more conservative approach was 
taken. Statistics based on these data and assigned new scale values which 

25 are invariant with regard to the numbers assigned on the original scale 
were used. Such techniques are commonplace in the statistical literature 
{Lehmann:86} and have also been used by mathematical psychologists 
{Luce:71 }. This technique produces correct results independent of the 
numerical values of the HDRS items. 

30 A derived scale was constructed from the cumulative 

population probability distribution of each of the HDRS items. This 
distribution is invariant to the underlying scale values because the 
cumulative population distribution for each of the items does not 
depend on the numbers assigned to an item. It measures the proportion 



of items in the population which have a score less than or equal to 
the given score. Functions of the distributional scores are the only 
invariants with regard to arbitrary monotone changes of the underlying 
scale {Luce:71,Luce:90}. 
5 The cumulative distribution of each item represents a 

sample with a fixed distribution. Three distributions were chosen: 
(1) exponential (Exp); (2) gamma (Gam); and (3) Gaussian (Norm). The 
parameters of the gamma and Gaussian distributions were chosen so that 
the means and variances coincided with the distribution of the data. 
1 0 The derived scale values were chosen to be the inverse of these 
constructed distribution functions at the HDRS item values. These 
derived scale values are the values of the hypothesized random 
variables which match the probabilities obtained from the population 
distribution function. This transformation removed the compression 
i 1 5 inherent near probability one of the population distribution function 
3 and constructs a theoretically motivated scale from ordinal data. The 
l procedures used for these transformations are described in Appendix 
j Transformations, Luciano, U.S> Provisional Patent Applic. S.N. 60/ 
j The original data of (N=99) input-output pairs (see Section 

* 20 Data Representation) were transformed to create four datasets. 

One remained untransformed (Raw) while three were transformed: 
exponential (Exp); gamma (Gam); and Gaussian (Norm). The same 
transformations were applied to individual scores for both pre- and 
post-treatment measurements. The total (severity) scores were 
% 25 calculated from the transformed values. Multiple linear regression 
and backpropagation were then applied to each of these four datasets. 
The dataset which yielded the best performance was then used in all 
subsequent analyses. 

Preliminary analysis indicated better results with |\em continuous} 
30 outcome as the target of the prediction, i.e. (the percent change in 
the patient) than with predictions of {\em categorical} outcome, i.e. 
the patient recovered or did not recover. Most subsequent detailed 
analysis therefore used a continuous output measure, although some 
categorical results are presented below. Preliminary analysis also 
35 indicated that the exponential transformation yielded the best results 
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for the neural network model. Consequently, the exponential 
transformation was used in all subsequent analysis. 

Referring now to Figure 4-2, a schematic representation of the effect 
5 of normalizing transformations on reducing nonlinearity of 

score-to-output relationships (or skewedness of distributions) is 
illustrated. In the transformation, the area under the curve is preserved. 
The transformation redistributes the position of the data values along the 
x-axis in order to preserve the areas under the curve between adjacent 
1 0 scores values while redistributing these data to best approximate a 
normal distribution. Equal areas under the curve between percentiles map 
to equal areas under the curve in the new distribution. 

Comparison with Chance Performance 
I 1 5 The mathematical foundation for the proportion of variance expected 

by chance given the number of parameters and the number of samples is 
approximated by dividing the number of parameters in the model by the 
number of samples. As an auxiliary verification of this estimation, we 
used S-Plus to generate random (chance) data N=99, normally distributed 
; 20 (mean = 0, standard deviation = 1 ) which was then used in place of the the 
actual data (symptom, treatment and outcome data) and then tested the 
predictive power of the model on these chance data. A backpropagation 
network with the same configuration used in the above described analysis 
(two hidden units) was used and trained and tested by the network on these 
j 25 random (chance) data. The purpose of this auxiliary test was to verify 
chance performance on chance data as a null hypothesis. 

Interpretation of Backpropagation Weights 

While it is clinically useful to be able to predict outcome, it is 
30 even more useful to know to what degree each of the symptoms 
contributes to the prediction. The symptoms of the backpropagation 
network model were ranked by influence on the response pattern. This 
gives a rough indication of the most important symptoms. Because 
backpropagation is nonlinear, a linear measure of the influence of a 
35 symptom (input variables) on the response does not exist. As a rough 
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approximation, we assumed that the transfer functions at the neural 
network nodes operate in their linear ranges. 

For each symptom, the influence was determined and 
ranked as follows: 

5 

1. The weight from the symptom (input) unit to Hidden Unit 1 was 
multiplied by the weight from hidden unit 1 to the output. 

2. The weight from the symptom (input) unit to Hidden unit 2 was 
multiplied by the weight from hidden unit 2 to the output. 

1 0 3. The symptom's influence is the sum of the products 

obtained in steps 1 and 2. 

The symptoms then were ranked by their unsigned values. A threshold 
^ equal to 20% of the maximum unsigned value was computed. Symptoms 
jj 15 that fell below this threshold were assumed to be not significant. 
0 Negative values were interpreted to inhibit a positive response or 
%_ indicate nonresponse. 

y In this section it is concluded that the relationships between 

=F 20 pre-treatment symptoms and outcome are nonlinear because the nonlinear 
jL methods explain more variance than the linear method, and that it is 
H allowance for nonlinearity in the method rather than the specific 
flj nonlinear method that is important in obtaining the better results. 
O We also show that outcome can be predicted, but weakly. The 
Jj 25 proportions of variance explained by the nonlinear models are highly 
significant, but low. The symptoms with the highest predictive power 
in these data were mood, severity, and middle and late sleep 
disturbances. Finally, the choice of the exponential form 
as a distribution function is validated. 

30 

Nonlinear Method Yields a Better Model 

The performance of the linear regression and nonlinear models was 
compared using an r to z transformation. This method was used to 
determine if the correlation coefficients of the two models are 
35 significantly different from each other. Table 4.5 demonstrated that the 
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nonlinear regression method (Backpropagation) explains significantly more 
of the variance in these data than the linear regression (Multiple 
Regression) model (p<0.0001). Therefore, the nonlinear regression 
method (Backpropagation) accounts for significantly more of the proportion 
of variance in the data than can be attributed to chance. Table 
4.7 shows the significance (p < 0.0001) of the 
goodness of fit of the backpropagation model to the full test and 
training set (N=99). The goodness of fit test was performed on the 
prediction results obtained from analysis of the full data set (N=99). 

Table 4 5 Result of r to z ttansformation and comparison of significance of differences of 
the goodness of fit for the linear multiple regression model versus the nonlinear backpropagation 
model. N=99 



Comparison of Difference in Goodness of Fit 


System 


r 


z-score 


Multiple Regression 


0.373 


0.392 


Backpropagation 


0.748 


0.969 


Normal deviate 




-3.716 


P 




0.0002 



20 



25 



30 



35 



Table 4.6 Comparison of significance for linear and nonlinear 

methods. Significance values calculated using F-statistic for linear . 

method and method based on maximum likelihood for nonlinear methods. TxFlag mdiarted thatthe 

data set included a flag indicating the treatment the patient received, * indicated p<0.05, ns - 

not significant (and significance level was not listed in the chart), x indicates the analysis could not 

be performed (not enough data), ** indicates 

detailed analysis in text, r is Pearson's r, r 2 is the 

proportion of variance explained by the model, p was computed using 

the appropriate goodness of fit test. 



Comparison of GooC 


Iness of Fit and Significance 


Data set 
(N = # Samples) 


Multiple 
Regression 
(linear) 
r 2 (p <) 


B ackpropagation 
(nonlinear) 
r 2 (p <) 


Quadratic 
Regression 
(nonlinear) 
r 2 (p<) 


CBT (13) 


.8810 (ns) 


.4642 (ns) 


X 


DMI (49) 


.1548 (ns) 


.5685 (.005*) 


.5399 (.01*) 


FLU (37) 


.1549 (ns) 


.5510 (-079) 


.8696 (.005*) 


DMI + FLU (86) 


.0895 (ns) 


.3147 (.05*) 


.4318 (.00043*) 


DMI + FLU + CBT (99) ** 


.0917 (ns) 


.3156 (.025*) 


.3875 (.0005*) 


DMI + FLU + TxFlag (86) 


.1395 (ns) 


.5601 (.0001*) 


.4232 (.00081*) 


DMI + FLU + CBT + TxFlag (99) 


.1389 (ns) 


.4389 (.0005*) 


.4062 (.0005-) 
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5 Table 4.7 Summary of results for the full training set. The table shows percent correct, Root 
Mean Square (RMS), and Proportion of Variance (r2) for the backpropagation network with two 
hidden nodes. Input data were factor scores, raw or transformed using an exponential function 
(Exp). Output data were categorical or continuous. Momentum was 0.9, learning rate was 0.01. 
n/a = not applicable. 

10 



Transformation % correct 


RMS 




F 


P 


Categorical output with logistic function at output 


Raw 


81.8 


0.367 


0.4646 


2.2819 


0.003013 


Exp 


75.8 


0.368 


0.4587 


2.2284 


0.003840 


Continuous-normalized output with logistic function at output 


Raw 


n/a 


0.169 


0.4533 


2.1804 


0.004770 


Exp 


n/a 


0.113 


0.6661 


5.2459 


0.000002 



Nonlinear Methods Significantly Better Than Chance 
20 As an auxiliary confirmation, a backpropagation was run on 

random data. The proportion of variance ( r 2 ) obtained were 
slightly lower than our theoretical calculation. The r 2 obtained 
from predicting random variables was 0.2454, whereas r 2 expected 
was 0.2727. Table 4.6 shows that in all but the case 
25 of fluoxetine alone (FLU), the backpropagation model was significantly 
better than chance. The quadratic regression model also performed 
significantly better than chance. For the cognitive behavioral 
therapy data (CBT), it was not possible to run the quadratic 
regression model because there were too many parameters (21) for the 
30 number of samples (13). In all other data sets, the quadratic 
regression model was significantly better than chance. In contrast, 
the linear method performed at chance for all data sets. 

Results Independent of Particular Nonlinear Model 
35 This section shows that multiple linear regression on individual 

symptom factors was not significant, whereas multiple linear 
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regression on the nonlinear data, which included symptom interaction 
terms, (quadratic regression) was significant. Table 4.8, 
shows the poor results obtained from individual symptom data alone, 
Table 4.9 shows the improved results from the quadratic 
regression model of comparable size to the backpropagation model. 
This suggests nonlinearities should be included either in the method 
or the data to improve performance, and that the improved performance 
is not a result of bias introduced by more parameters in one of the 

Table4 8 Multiple Regression on Combined Data - The results of multiple linear regression to 
oredict outcome. Data were combined from three studies: (a) cognitive behavioral therapy 
(CBT ^ N=13X desipiamine (DM, N=49), and fluoxetine (FLU, N=37). Proportion of Variance 
explained by the model, given by Pearson's r2 = 0.09170485. 



Symptom 1 Value |Std. Error | t value 



MLSleep 

Mood 

ESleep 

Anxiety 

Severity 

Work 

Energy 

Cog 

(Intercept) 



2.068060e-01 
■1.142409e-01 
1.026300e-01 
8.396366e-02 
8.205999e-02 
-3.288226e-02 
3.182421e-02 
-1.524567e-02 
-3.394970e-06 



0.1234877 
0.1206277 
0.1115722 
0.1084657 
0.1578884 
0.1052959 
0.1050965 
0.1107273 
0.1004598 



1.674709e+00 
-9.470532e-01 
9.198525e-01 
7.741031e-01 
5.197341e-01 
-3.122844e-01 
3.028094e-01 
-1.376866e-01 
-3.379431e-05 



0.09746297 
0.34614758 
0.36010875 
0.44089859 
0.60452487 
0.75554676 
0.76273390 
0.89079566 
0.99997311 



Table 4 9" The quadratic regression model with 21 parameters (K = 21). Data were combined 
from two drug studies: (a) desipramine (DMI, N = 49) and fluoxetine (FLU, N = 37). Best fitting 
model of size 21 selected by a backwards stepwise procedure (Statistical Sciences, 1993) from the 
model including all two way interactions. Proportion of Variance explained by the model, given by 
Pearson's r 2 = 0.3874559. 



Symptom 



Value IStd. Error t value 



Cog.Severity 
MLSleep 
Mood. Cog 
ESleep.Work 
Work.Anxiety 
ESleep. Severity 
(Intercept) 
ESleep 

Mood. Anxiety 
Anxiety.Severity 
ESleep .Anxiety 
Cog.ESleep 
ESleep. Energy 
MLSleep.Anxiety 
Severity 
Mood. MLSleep 
Cog.Anxiety 
Cog. Energy 
ESleep. MLSleep 
Mood. Severity 
Mood .ESleep 



0.46542411 
0.3309996 
0.3621551 
0.2670521 
0.3092271 
0.4036986 
0.2600431 
0.2006639 
■0.2122875 
0.3176015 
■0.2007385 
-0.2234796 
-0.1731981 
-0.1866428 

0.1978865 

0.1340615 
-0.1415459 

0.1359817 
-0.1350899 

0.1218931 
-0.1049684 



0.13375693 
0.10911044 
0.11982175 
0.09781181 
0.11835542 
0.17558200 
0.11764378 
0.10474988 
11095378 
0.16949988 
0.11197744 
0.12811221 
0.10063794 
0.12750812 
0.14989573 
0.10611172 
0.11481283 
0.11237262 
0.11977899 
0.12952479 
0.11528845 



3.4796263 
3.0336204 
3.0224490 
2.7302645 
2.6126992 
2.2992025 
-2.2104278 
1.9156477 
-1.9132966 
1.8737565 
-1.7926687 
1.7444053 
1.7210024 
-1.4637718 
-1.3201611 
1.2633998 
-1.2328404 
1.2100967 
-1.1278264 
0.9410797 
1-0.910 4852 



0.0008249741 
0.0032811594 
0.0033913402 
0.0078202408 
0.0107732789 
0.0241699457 
0.0300061936 
0.0590747738 
0.0593796231 
0.0647092028 
0.0769029771 
0.0850276002 
0.0892149166 
0.1472748574 
0.1906423332 
0.2102087949 
0.2213381323 
0.2298962726 
0.2628505714 
0.3495695121 
0.3653717773 



90 




\ 



91 

Relationships are Nonlinear 

It is concluded that the relationships are nonlinear and the choice of 
the specific nonlinear model was not important in obtaining increased 
performance. This was demonstrated in two ways. First the 
5 quadratic regression model was created , which included variables for all 
two-way interactions between symptoms. A backward stepwise procedure 
was used to obtain a model of same size as the backpropagation. The 
results were comparable (see Table 4.10). To rule out the 
possibility that the increased number of parameters was responsible 

1 0 for all of the improved performance, we built another quadratic 
regression model, this time matched with the number of parameters in 
the linear model. Table 4.11 shows the improved results of 
the linear regression with the inclusion of the interaction terms, but 
with a model size of the original regression on symptoms alone (i.e., 

15 without terms for symptom interactions). Table 4.10 shows 

the proportion of variance explained by each of the models. There was 
a significant improvement in the performance of linear regression 
model, but with variables that include the nonlinearities i.e., two 
way the interactions between symptoms. 

20 

Table 4. 10: Comparison of variance explained r 2 for linear and 
nonlinear methods with different numbers of parameters. The number of 
parameters in the nonlinear model (QR) adjusted to 12 in order to 
match linear model. This removed the bias associated with more 
25 parameters. BP = backpropagation, QR = Quadratic regression. The 
numbers in parenthesis represent the number of parameters in the 
model. For BP the numbers vary with the data set and are specified 
with each entry. Significance levels are given forQR 11, Table 
4.6 gives the significance levels for the other models. 

30 



Comparison of Explained Variance (r 2 ) 




Data set 


BP 
r 2 


QR (21) 
r 2 


MR (11) 
r 2 


QR (11) 
r 2 (p) 


DMI + FLU (N = 86) . 


.3147 (21) 


.4318 


.0895 


.2913 (0.005) 


DMI + FLU + CBT (N = 99) 


.3156 (21) 


.3875 


.0917 


.2736 (0.003) 


DMI + FLU + TxFlag {N = 86) 


.5601 (25) 


.4232 


.1395 


.3199 (0.002) 


DMI + FLU + CBT + TxFlag {N = 99) 


.4389 (27) 


.4062 


.1389 


.3095 (0.001) 



Table 4. 11: The results of quadratic regression model with 11 

parameters (K=ll). Data were combined from two 
drug studies: (a) desipramine (DM, N=49) and fluoxetine (FLU, 
N=37) and included a variable that indicated which treatment the 
5 patient received. Best fitting model of size 1 1 selected by a 

backwards stepwise procedure \eite{SPLUS:93} from the model including 
all two way interactions. Proportion of Variance explained by the 
model, given by Pearson's r2 =0.3199294. 



Symptom 



Value IStd. Error I t value 



Cog.Severity 
ESleep.Work 
DMI 

Anxiety.Severity 

Mood. Cog 

Cog.ESleep 

MLSleep 

Work.Anxiety 

ESleep.Severity 

(Intercept) 

Mood.ESleep 



0.3312666 
0.2829667 
■0.2726085 
0.2325933 
-0.2574497 
-0.2121521 
0.1912295 
-0.2283782 
0.2464627 
-0.1698153 
-0.1566263 



0.12265150 
0.10499048 
0.10260370 
0.09678584 
0.10801041 
0.11052312 
0.10154901 
0.12188894 
0.13517284 
0.11105281 
0.11703232 



2.700876 
2.695165 
2.656907 
2.403175 
2.383564 
•1.919527 
1.883125 
■1.873658 
1.823315 
-1.529140 
-1.338317 



008545030 
008679468 
009630856 
018725709 
019677175 
058721482 
063560872 
064873109 
.072240277 
130436621 
.184836384 
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Symptoms Are Weak Predicts of Response 

Table 4.6 demonstrates that symptoms are significant 

predictors of outcome. They however are weak predictors of response 

because, in general, they account for less than half of the variance. A 
5 preliminary analysisis reported in which symptoms, symptom 

combinations, or symptom interactions, seem to be the most important in 

terms of their contribution to predicting the response. 

The input patterns (symptom profiles) for which the network 

predicts the best possible represent prototypical patients. The weight 
10 coefficients that are important in the prediction also help refine the 

patient profile. 

The column heading in Table 4.12 labeled Influence indicates the 
contribution of each symptom (input) on the outcome (response). Table 
4.1 2 ranks the contribution in terms of the percent change in response for 

1 5 each symptom factor. These results indicate that for the combined 
data (all three studies) Mood, Severity, and Middle and Late Sleep 
disturbance have the greatest influence in determining the outcome for 
the backpropagation method. For the regression method, the three most 
significant indicators were Cognitions and Severity combined, Middle 

20 and Late Sleep, and Mood and Cognitions combined. Mood, Severity, and 
Middle and Late Sleep disturbance appear in the top three for both 
methods, which may be an indication of a stronger relationship with 
outcome. 

25 Table 4. 12 Comparison of rank of independent variables (symptoms) on outcome between 
two nonlinear methods, backpropagation and quadratic regression. (-) indicates predicts poor 
outcome. The database used was CBT+DMI+FLU (no treatment flag). 
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Predictors of Response 


Backpropagation 


Quadratic Regression 


Symptom 


Influence 


Symptom(s) 


V 


Mood (-) 


-32.925 


Cog.Severity 


0.0008249741 


Severity 


21.637 


MLSleep 


0.0032811594 


ML Sleep 


21.376 


Mood.Cog (-) 


0.0033913402 


Energy 


20.081 


ESleep.Work (-) 


0.0078202408 


Cognitions (-) 


-13.354 


Work .Anxiety (-) 


0.0107732789 


Anxiety 


8.209 


ESleep .Severity 


0.0241699457 


E Sleep 


7.275 


(Intercept) (-) 


0.0300061936 



5 

Population Best Approximated by Exponential Function 

Irregularities in the data that resulted from the limitations of 
ordinal scale instruments were minimized most when the data were 

10 compared after they were transformed by an exponential distribution 
function. As the ability of backpropagation to learn nonlinear 
mapping relies on a sufficient number of hidden nodes and nonlinearity 
of the nodes themselves, it is reasonable to examine the effect of the 
transformation in the continuous-normalized case with a logistic 

15 function at the output. Table 4.13 shows the Root Mean 

Square (RMS) error from worst to best for the raw data followed by 
each of the transformations. Note that the variances for 
backpropagation were smaller than those for multiple regression. The 
difference in RMS error is marginal when the transformation is good, 

20 i.e. when the transformation matches the underlying distribution and 
effectively linearizes the input data. 

Table 4. 13 Comparison of performance of multiple regression and backpropagation 

algorithms on data transformed to assume one of three probability distribution functions. Values 
2 5 given are Root Mean Squared (RMS) Error. 



Algorithm 


Raw 


Normal 


Gamma 


Exponential 


Multiple regression 


0.262 


0.249 


0.231 


0.204 


Backpropagation 


0.241 


0.215 


0.203 


0.198 


Difference 


0.021 


0.034 


0.028 


0.006 
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Furthermore, backpropagation slightly outperformed multiple 
regression even with the best transformation method (which assumed 
exponential transformation as the underlying distribution). This 
indicates that the non-linear mapping capability of backpropagation 
5 enabled it to cope with the non-standard underlying distribution which 
could not be remedied by any of the transformations. 

Outcome-discussion 

The results indicate nonlinear methods may capture more of the 

10 information in the data than previously were captured by linear 
techniques. These preliminary results indicate that the data were 
nonlinear, that the nonlinear methods explained more of the variance 
in the data, and that it is the use of a nonlinear method that is 
important, not the particular nonlinear method. We also showed that 

1 5 symptoms are significant predictors of outcome. They are weak 
predictors in that they only explain up to about half of the variance 
in the data, i.e. Table 4.10 shows 42% r2 explained 
using quadratic regression, 56% using backpropagation; and Table 
4.7 shows 45% to 67% explained using backpropagation with a logistic 

20 function at the output node. 

The results are promising to the clinical community as they indicate 
that the interactions among the symptoms of depression are important 
and that studying the interactions among symptoms may increase our 
understanding of depression. It is possible that depressive subtypes 

25 may emerge using nonlinear analysis that may not have been detectable 
when the focus was on individual symptoms alone. 

In addition, existing data can be reanalyzed. New methods may be able 
to create new knowledge from existing data sets without the additional 
cost of clinical trials. By using the quadratic regression method described, 

30 which used multiplication of symptom severities to estimate 

interactions between symptoms, researchers can now reanalyze their 
data. This technique allows clinical researchers to use regression 
methods already familiar to them, which would facilitate reanalysis. 

Statistically significant predictors of outcome have been found in 

35 individual studies, however the results are not consistent across 



96 

studies. The nonlinear models we presented accounted for a 
significant proportion of variance, and so, we also, were able to 
reject the null hypothesis, and state that performance was better than 
chance. We have shown that some information is being captured by the 
5 symptoms. On the other hand, there remain significant predictors of 
outcome yet to be discovered. Furthermore, we expect better models to 
result from further study. It would, of course, be better to have 
more data, in particular for the cognitive behavioral therapy study. 
Some references to methods that attempt to handle small sample sizes 
1 0 more effectively are presented. Notwithstanding the above, the nonlinear 
models' fit to the data are highly significant and can, in some cases, 
account for more than half of the proportion of variance in these data. 
Any improved theoretical model would have to capture the empirical 
a relationships captured by the backpropagation and quadratic regression 

3 15 models. 

O Overall severity at baseline was not found to be a 

% significant predictor of response using linear methods. Using 

tj quadratic regression, overall severity alone was not 

W predictive of response, however, overall severity crossed with 

4 20 impairment in cognitions and overall severity crossed with early 
p insomnia both predicted favorable response to cognitive behavioral 
jTj therapy, desipramine and fluoxetine. 

fU The best individual predictor of response to treatment was middle 

o andlate sleep disturbance. Significant interaction terms were found for 
jj 25 severity of depression crossed with severity of cognitive impairment, 
severity of mood crossed with severity of cognitions, severity of early 
sleep crossed with work inhibition, severity of anxiety crossed with work 
inhibition and severity of early sleep disturbance crossed with 
overall severity of the depressive syndrome. Bowden et al. 
30 {Bowden:93} found no baseline symptoms to be predictive of 

outcome. Middle and late sleep disturbance have been found by others 
to be predictive of response to amitriptyline, imipramine and 
tranylcypromine, but not desipramine. There were no results reported 
for symptoms in Johnson et al. study of response to CBT 
35 {Johnson:94}. 
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Further data would be needed to thoroughly substantiate the findings, 
but the results indicate that in CBT and DMI, the relationship between 
symptom severity and outcome is nonlinear. The inability of the 
nonlinear models to predict outcome may be a contributing factor to 
5 previous accounts where symptoms and severity were not found to be 
significant predictors of outcome for desipramine, fluoxetine, and 
cognitive behavioral therapy. 

. Effects of Scale Normalizing Transformations 
10 The results indicated that the choice of the nonlinear method, i.e., 

backpropagation or quadratic regression was not important. From this 
it was concluded that it was reasonable to use the backpropagation 
algorithm to select the probability distribution function. Among 
different transformations, the exponential transformation resulted in 
3 1 5 the lowest errors overall. It is interesting that the exponential 
Q distribution gives the best result as a data transformation. The 
:5 exponential transformation is the maximal entropy distribution with 
:j finite mean whose support is the entire positive half line 
W {Rao:73}. 

=P 20 The difference between the performance of the model produced by 

- backpropagation and that produced by the linear regression method on 
yj the transformed data is that the backpropagation can process the scale 
Tu dependent nonlinearities between the independent and dependent 
5 variables, whereas the linear method cannot. The linear method relies 
m 25 more on these data transformations than the nonlinear method and so an 
increase in performance is expected to be greater using the 
transformation and the linear technique than using the transformation 
(which normalizes the scale) and a method that can do this anyway. 
Scale dependent nonlinearities between dependent and independent 
30 variables, and backpropagation can cope with nonlinearity by itself, whereas 
multiple linear regression relies more heavily on transformations. 

Backpropagation has the ability to learn arbitrary nonlinear 
mappings from inputs to outputs provided that there are enough hidden 
units and enough data to estimate the parameters. Put into the 
35 context of predicting outcome from symptoms, there is no need to 
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assume linear relationships between symptoms and outcome. If there are 
nonlinearities, backpropagation will learn to approximate them by 
itself (Figure 4.1), however it is harder, slower, more 
error-prone, and needs more data to do so. So, preprocessing to 
5 normalize the scale is desirable. 

Another way to cope with the inhomogeneous scale is to transform 
the input to make the mapping between the actual data and distribution 
assumption closer (Figure 4.2). For example, assume that 
in the population (i.e. in the ideal limit) the symptom values 

1 0 in the underlying scale have a linear effect on the outcome, and that 
these values have some typical distribution such as the normal 
distribution. Then the nonlinearity can be thought to be caused by 
the non-homogeneous mapping from this ideal scale to the actual 
symptom scale employed. If so, the nonlinearity can be removed by 

1 5 transforming the symptom value in a non-homogeneous manner so that the 
observed distribution matches the ideal distribution and in effect 
becomes (or appears) linear. 

Outcome -samplesize 

20 One drawback of nonlinear systems is that they require more data to 

extract explanatory rules. In situations, such as clinical research 
in depression, large sample sizes are difficult to achieve. As such, 
sample size is a limiting factor in training neural network models 
such as backpropagation. In this study, data from ninety-nine 

25 patients (combined from three studies) were available. Because these 
data are inherently noisy, and because backpropagation, as a rule of 
thumb, typically requires about ten input-output pairs per free 
parameter, ninety-nine input-output pairs must be considered as a 
small sample size, which severely restricts the network's ability to 

30 generalize. A larger sample size would be needed before the 
predictive capacity of baseline symptoms can be assessed using a 
backpropagation model. 

Since the nonlinear methods necessitate larger sample sizes more 
data would be useful in order to further validate our model. In lieu of a 

35 larger sample size, other techniques may be useful in validating the 
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predictive power of the nonlinear models. One next step would be to 
use techniques based on resampling theory. The resampling techniques 
use a stratified random sample, or resample the entire sample set (99 
in this case) many times, instead of the conventional method for 
5 splitting the training and test set into two disjoint sets. 

Resampling techniques include the jackknife method and the bootstrap 
method {Efron:82,Efron:91 }. In bootstrap methods, for example, 
the training and test sets are kept as one large sample. The training 
set is developed by resampling the entire set, i.e. each sample is 
10 replaced before another sample is taken. This method can be used to 
generate goodness of fit statistics. 

Choice of Predictor Variables 

Variables other than the Hamilton items may be used in the above 

1 5 method. Other clinical data, such as pre-treatment neurotransmitter 
metabolites from blood or urine, may also be used to define idealized 
patient profiles and idealized or standardized patterns of recovery of a 
patient receiving a specified treatment regime. Other forms of data such as 
non-invasive neuroimaging information, demographic information, family 

20 history, and genetic information may be used for their predictive 
capacity for establishing treatment outcome predictors. 

Further, with the use of patient symptom profiles and patient 
symptom profiles in response to a treatment regime, where the outcome to 

25 treatment is variable based upon the currently observed patient symptoms, 
other disorders may be modeled using the instant invention by providing a 
database of known baseline symptoms and responses to treatment gathered 
from the clinical literature and experience to the symptom profiler, training 
the outcome profiler to provide idealized response patterns, and using the 

30 output from the trained outcome profiler to generate recommended 

treatment regimes and expected patterns of recovery for individual patients 
based upon the symptoms that each exhibits and the respose to treatment 
that each exhibits. Such disorders, for example, may include AIDS and 
breast cancer. As with the method for the disorder described above, patient 

35 symptom information may be added to the system profiler to increase the 
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precision of the idealized pattern generated by the symptom profiler and the 
outcome profiler. 

The foregoing is considered only illustrative of the currently 
5 preferred embodiments of the invention presented herein. Since numerous 
modifications and changes will occur to those skilled in the art, it is not 
desired to limit the invention to the exact method or application of that 
method used to illustrate the embodiments comprising this invention. 

What is claimed is: 



10 
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CLAIMS 




A method for predicting patient response to treatment of unipolar 



5 /depression from at least one pre-treatment clinical symptom, comprising: 
/ performing at least one measurement of a symptom on said patient and 
measuring said symptom so as to derive a baseline patient profile; 
defining a set of a plurality of predictor variables which define the data of 
the baseline patient profile, said set of predictor variables comprising 
1 0 predictive symptoms and a set of treatment options; 

deriving a model that represents the relationship between patient response 
and the set of predictor variables; and 

utilizing the model of step c) to predict the response of said patient to the 
treatment. 

15 

2. The method according to claim 1 , wherein said relationship in step c) 
is determined via at least one automated algorithm. 

3. The method according to claim 2, wherein said model is a multilayer 
20 neural network, and wherein said at least one algorithm is a back 

propagation learning algorithm. 

4. The method according to claim 3, wherein said neural network has at 
least three layers and at least two hidden units. 

25 

5. The method according to claim 1 , wherein said relationship in step c) 
is determined via quadratic regression. 

6. The method according to claim 5, further comprising using a set of 
30 independent variables in said quadratic regression, said set of independent 

variables representing interactions between said predictive symptoms. 



35 



7. The method according to claim 6, further comprising estimating said 
interactions between said predictive symptoms by multiplying symptom 
severities. 
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8. The method according to claim 1 , wherein said model is non-linear. 

9. The method according to claim 1 , further comprising utilizing the 

5 model of step c) to rank by response influence the predictive symptoms to 
indicate the predictive importance of a predictive symptom. 

10. The method according to claim 9, wherein said model is a multilayer 
neural network utilizing a back propagation learning algorithm having three 

10 layers and two hidden units, and an output; and said influence of a predictive 
symptom is determined by summing a first product and a second product, 
said first product being a first weight from said predictive symptom to a 
first hidden unit multiplied by a second weight from said first hidden unit 
to said output, and 

1 5 said second product being a third weight from said predictive symptom to a 
second hidden unit multiplied by a fourth weight from said second hidden 
unit to said output. 

11. The method according to claim 1, wherein said set of predictive 
20 symptoms comprises a plurality of: Mood, Work, and Energy. 

12. The method according to claim 1, wherein said set of predictive 
symptoms comprises a plurality of: Mood, Severity, and Middle and Late 
Sleep. 

25 

13. The method according to claim 1 , further comprising: before step a), 
providing a set of known baseline patient profiles and treatment outcomes, 
which known profiles and outcomes are used in step c) for deriving said 
model. 

30 

14. The method according to claim 1 3, wherein said model of step c) is a 
neural network. 



1 5/ A method of treating depression in a patient comprising the following 
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defining a set of a plurality of predictor variables which define the data of 
a baseline patient profile, said set of predictor variables comprising 
predictive symptoms and a set of treatment options; 
developing an outcome prediction for said set of treatment options, said 
5 outcome prediction based on an analysis of patient symptoms; 

selecting a first preferred treatment option from said set of treatment 
options based on said outcome prediction; 
applying said first preferred treatment option to the patient; and 
monitoring the patient by comparing a response of the patient to said 
1 0 treatment option to said outcome prediction to provide an updated outcome 
prediction for the patient. 

16. The method of claim 1 5 further including the step of selecting a 
second preferred treatment option from said set of treatment options based 
1 5 on said updated outcome prediction. 



20 
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ABSTRACT 



A method useful for facilitating choosing a treatment or treatment 
5 regime and for predicting the outcome of a treatment for a disorder which 
is diagnosed and monitored by a physician or other appropriately trained and 
licensed professional, such as for example, a psychologist, based upon the 
symptoms experienced by a patient. Unipolar depression is an example of 
such a disorder, however the model may find use with other disorders and 

10 conditions wherein the patient response to treatment is variable. In the 
preferred embodiment, the method for predicting patient response includes 
the steps of performing at least one measurement of a symptom on a patient 
and measuring that symptom so as to derive a baseline patient profile, such 
as for example, determining the symptom profile with time; defining a set 

15 of a plurality of predictor variables which define the data of the baseline 
patient profile, wherein the set of predictor variables includes predictive 
symptoms and a set of treatment options; deriving a model that represents 
the relationship between patient response and the set of predictor 
variables; and utilizing the model to predict the response of said patient to 

20 a treatment. A neural net architecture is utilized to define a non-linear, 
second order model which is utilized to analyze the patient data and 
generate the predictive database from entered patient data. 
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Depression Disorder Integrated Model (DDIM) 

A method of choosing treatment based on selection (after outcome predictions 
are made) and then a method for generating a profile for patient monitoring 
patient response to specific treatment based on a method for prediction 
treatment specific response patterns. 



Diagnosis and other baseline data are collected and transmitted as input to 
the "Outcome Predictor/' The outcome predictor has been trained on other 
(prior) patients baseline and outcome data. Predictions and confidence limits 
are generated for a number of treatment options. 

Treatment selection suggestion is made by the system for evaluation and 
consideration by a clinician, who has the final responsibility for the 
treatment selection. 

The baseline data and the chosen treatment are input to the Pattern 
Predictor 
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Training Cycle for Each Treatment Group 



Update Parameters 



Update parameters on patient 1's symptom data 



Update parameters on patient 2's symptom data 



Update parameters on patient 3's symptom data 



Update parameters on patient 4's symptom data 



Update parameters on patient 5's symptom data 



Update parameters on patient 6's symptom data 




Treatment specific recovery model 

Figure 3-2: Training cycle for each treatment group. Each of the two models with separate sets 
of parameters, but with a same architecture, was trained by cycling through individual data within 
the respective treatment group. After each cycle, the cost function (which reflects the degree of fit 
of the model predictions to the actual data) was evaluated to determine the completion of training. 
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Figure 3-3: Overview of the recovery model. Ellipses represent symptom factors, arrows 
factor" sym P tom fact °rs indicate each symptom factor can influence every other symptom 
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Figure 34: The annotated second order differential equation used to model the pattern of 
recovery. 
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Treatment onset 

Figure 3-5: Direct effects of treatment are either immediate (step function) or delayed (sig- 
moid function). Delays are estimated by treatment from the patient data, using an optimization 
procedure. 
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Figure 3-6- Direct effects and interactions in the recovery model. « f represents the strength 
of the immediate effect of treatment on symptom node i; v { represents the strength of the delayed 
effect of treatment on symptom i; and Wji and w i} represent the interaction between the symptoms: 
the strength of the effect of symptom i on symptomj and the strength of the effect of symptom j 

on symptomi, respectively. 
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Figure 3-7: Overview of Training Process. Parameters were initialized with regression matrix 
which was calculated from actual symptom values (ASV) by correlation and regression analyses 
The model used these initial parameters to predict symptom values (model symptom values MSV) 
of each patient from baseline. The optimization process iteratively modified parameters to minimize 
the discrepancy between MSV and ASV. 
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Figure 3-8: A schematic description of the cost function L. The formula inside 
the integration has two terms. By minimizing the first term, discrepancies between 
estimated and actual patterns of recovery are minimized. The second term is a 
constraint term which states that the differential equation must hold. 
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(a) CBT patient 180 
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(b) CBT patient 195 
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(c) DMI patient 155 
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(d) DMI patient 181 



Figure 3-10: Individual patterns of recovery. Plots, except the one at the bottom right, 

show patterns of recovery. Solid lines show actual patterns, dotted lines show predicted patterns. 
Numbers shown at vertical axis are scaled such that the possible maximum symptom factor value 
yields 1.0. The plot at the bottom right shows the error (L) on the ordinate axis plotted against 
number of training cycles on the abscissa. 
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(a) Predicted patterns of recovery by shunting equations 
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(b) Predicted patterns of recovery by second order equations 



Figure 3*9: Predicted patterns of recovery by (a) shunting and (b) second order 
equations. In both (a) and (b), plots except the sub-plot at the bottom right, show 
patterns of recovery. Solid lines show actual patterns, dotted lines show predicted 
patterns. Numbers shown at vertical axis are scaled such that the possible maximum 
symptom factor value yields 1.0. The plot at the bottom right in both (a) and (b) 
shows the error (L) on the ordinate axis plotted against number of training cycles 
on the abscissa. Note that the absolute values of the error measure L cannot be 
compared between shunting and second order equations, because the latter includes 
errors in the derivatives of L. 
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(a) Predicted and actual mean patterns of recovery (CBT) 
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(b) Predicted and actual mean patterns of recovery (DMI) 

Figure 3-11: Predicted and actual recovery patterns of patient data mean, (a) mean of six 
CBT responders (b) mean of six DMI responders. In both (a) and (b), all plots except the sub- 
plots at the bottom right, show mean patterns of recovery. Solid lines show actual mean patterns, 
dotted lines show predicted mean patterns. Numbers shown at vertical axis are scaled such that 
the maximum possible symptom factor value yields 1.0. The plot at the bottom right in both (a) 
and (b) shows the error L plotted on the ordinate axis against number of training cycles on the 
abscissa. Pat 0 indicates the patient averaged patient data were used. 
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Figure 3-13- Predicted CBT and DMI temporal response sequence of symptom improvement. 
The vertical axis shows the mean half reduction time in weeks; the horizontal axis has no meaning. 
Symptom names are placed vertically at their mean half reduction time. Significant difference 
(p < 0 05 after rounding) between half reduction times of energy and early sleep disturbance^ There 
is a trend (p < 0.10 after rounding) for a split between cognitions and work in CBT responders. In 
DMI there were no significant differences (or trends) in the sequence. 
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(a) Immediate Treatment Effects 
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(b) Delayed Treatment Effects 



Figure 3-14: Comparison of model's predicted immediate (left) and (b) delayed (latent) direct 
effects of treatment on symptoms for Cognitive Behavioral Therapy and Desipramine. A solid line 
represents CBT coefficient values, a dashed line represents DMI coefficient values. Symptom are 
represented along the x-axis. The coefficient values the parameter optimization procedure indicate 
the strength of the effect on the symptom at the time the effect takes place, and are placed on the 
y-axis. For example, the delayed effect of cognitions for desipramine occurs at 3.4 weeks with a 
magnitude of almost 1.5, whereas the delayed effect of cognitions of CBT takes place at 1.2 weeks 
and has a magnitude of about 4.2. A zero indicates that the model predicted the symptom would 
worsen initially. 
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Figure 3-15: Graphic representation of the sequence of symptom factors in recovery with 
Cognitive Behavioral Therapy treatment for the second order system. Vertical positions of the 
symptoms represent half-way-reduction time, arrows represent strong impacts and interactions, 
and corresponding numbers indicate the strength of the impact or interaction. See text for formula. 
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Figure 3-16: Graphic representation of the sequence of symptom factors in recovery with 
Desipramine treatment for the second order model. Vertical positions of the symptoms represent 
half-way-reduction time, arrows represent strong impacts and interactions, and corresponding num- 
bers indicate the strength of the impact or interaction. Dotted arrows show the interactions that 
operate in loops. See text for formula. 
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Table 4.1: Prior attempts to predict outcome from Hamilton Depression Rating Scale scores 
(symptoms) and severity (total HDRS) of depressive symptoms. Independent variables are the 
HDRS items described under the column Symptom. Numbers are indices into Table 4,2 where the 
citations are listed and indicate that the symptom at baseline was not found to be a significant pre- 
dictor of outcome. Underlined numbers are also indices into Table 4.2, but indicate the symptom at 
baseline was found to predict outcome. Blank entries indicate that the significance of the symptom 
to predict outcome was not reported. (+) indicates greater intensity predicted better response; (-) 
indicates greater intensity predicted poorer response. See Table 4.2 for key to citation references. 
Other=S-adenosyl methionine (Carney et aL, 1986), ECT or unspecified antidepressant medica- 
tion (Hinrichsen & Hernandez, 1993), unspecified antidepressant medication (Katon et aL, 1994); 
Tricyclic=tricyclic antidepressant medications; MAOI=monoamine oxidase inhibitor antidepres- 
sant medications; SSRI=selective serotonin reuptake inhibitors; ECT=electroconvulsive therapy; 
Ami^amitriptyline; Nor=nortriptyline; IMI=imipramine; DMI=desipramine; Clo=clomipramine; 
Lev=levoprotiline; Map^maprotiline- Phe— phenelzine; Tran^tranylcypromine; Flu=fluoxetine; 
IPT=interpersonal therapy; CBT=cognitive behavioral therapy. 
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Figure 4-1: Nonlinear mapping of backpropagation. Each hidden node finds a direction in the 
input space (illustrated by an arrow perpendicular to a small square piece) to which the output 
is sensitive to. The output of each hidden node goes through a nonlinear output function before 
being weighted and summed at the output node. 
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Figure 4-2: Schematic representation of the effect of normalizing transformations on reducing 
nonlinearity of score-to-output relationships (or skewedness of distributions). In the transformation, 
the area under the curve is preserved. The transformation redistributes the position of the data 
values along the x-axis in order to preserve the areas under the curve between adjacent scores values 
while redistributing these data to best approximate a normal distribution. Equal areas under the 
curve between percentiles map to equal areas under the curve in the new distribution. 
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